2 results found

InstructGPT, introduced in OpenAI's 2022 paper, revolutionized LLM development by shifting focus from raw capability to alignment. It fine-tuned GPT-3 using Reinforcement Learning from Human Feedback (RLHF) to make models more helpful, honest, and harmless. This multi-stage pipeline, involving supervised fine-tuning, reward model training, and PPO, taught LLMs to follow human instructions consistently, leading to the foundation of modern conversational AI like ChatGPT.

DeepMind veteran David Silver has secured an unprecedented $1.1 billion in funding for his new British AI lab, Ineffable Intelligence, at a $5.1 billion valuation. The company aims to build a "superlearner" AI that acquires knowledge and skills purely through reinforcement learning, without relying on human data, a radical departure from current large language models.