Reasoning
7 milestones in AI history
Logic Theorist: The First AI Program
Newell and Simon created the Logic Theorist, often called the first AI program. It could prove mathematical theorems from Whitehead and Russell's Principia Mathematica — and even found a more elegant proof than the original for one theorem. It was debuted at the Dartmouth Conference.
SHRDLU: Natural Language Understanding
Terry Winograd created SHRDLU, a program that could understand and respond to English commands about a simulated 'blocks world.' Users could ask it to move objects, answer questions about their arrangement, and even understand pronouns and context within its limited domain.
GPT-4: Multimodal Intelligence
OpenAI released GPT-4, a multimodal model that could understand both text and images. It passed the bar exam (90th percentile), scored 1410 on the SAT, and demonstrated remarkably nuanced reasoning. It was a massive leap from GPT-3.5 in accuracy, safety, and capability.
OpenAI o1: Reasoning Models
OpenAI released o1, a model trained to 'think before it speaks' using chain-of-thought reasoning at inference time. It could solve complex math, coding, and science problems by spending more compute thinking through multi-step solutions — trading speed for accuracy on hard problems.
DeepSeek R1: Open-Source Reasoning
Chinese AI lab DeepSeek released R1, an openly released reasoning model that approached OpenAI's o1-class performance at a fraction of the cost. Trained with reportedly modest compute budgets, it challenged the assumption that frontier reasoning required the largest Western-scale investment programs.
Claude 4 / Opus 4: Frontier Reasoning
Anthropic released Claude 4 Opus, a model with significantly enhanced reasoning, extended thinking capabilities, and the ability to sustain complex multi-step problem-solving over long contexts. It excelled at agentic tasks, code generation, and nuanced analysis.
OpenAI o3: Advanced Reasoning at Scale
OpenAI released o3, the successor to o1, with markedly improved reasoning capabilities. It posted state-of-the-art results on many math and coding benchmarks and handled problems that previously required expert-level multi-step analysis.