BERT: Bidirectional Language Understanding

What Happened

Google published BERT (Bidirectional Encoder Representations from Transformers), which could understand language context from both directions simultaneously. BERT shattered records on 11 NLP benchmarks. Google integrated it into Search, affecting 10% of all queries.

Why It Mattered

Transformed NLP overnight. Pre-training plus fine-tuning became the dominant paradigm, and BERT showed how transformer-based language models could capture context with a depth that reset expectations for search and language understanding.

Key People

Organizations

Tags

Related Milestones

The Transformer model architecture diagram from Attention Is All You Need
Research

Attention Is All You Need: The Transformer

Eight researchers at Google published 'Attention Is All You Need,' introducing the Transformer architecture. It replaced recurrence with self-attention mechanisms that could process entire sequences in parallel. The paper's title was deliberately bold — and proved prescient.

Ashish VaswaniNoam ShazeerGoogle BrainGoogle Research
OpenAI logo
Research

GPT-1: Generative Pre-training

OpenAI released GPT-1, demonstrating that a Transformer trained on vast amounts of text using unsupervised pre-training could then be fine-tuned for specific NLP tasks. With 117 million parameters, it showed the potential of scaling language models.

Alec RadfordOpenAI
GPT-2 language model generating text about itself
Research

GPT-2: 'Too Dangerous to Release'

OpenAI announced GPT-2 (1.5 billion parameters) but initially refused to release the full model, calling it 'too dangerous' due to its ability to generate convincing fake text. The decision was controversial — some praised the caution, others called it a publicity stunt. The full model was eventually released in November 2019.

Alec RadfordOpenAI
OpenAI logo
Research

GPT-3: The 175 Billion Parameter Leap

OpenAI released GPT-3 with 175 billion parameters — 100x larger than GPT-2. Without any fine-tuning, GPT-3 could write essays, code, poetry, translate languages, and answer questions through 'few-shot learning' (learning from just a few examples in the prompt). The API launched in beta, enabling thousands of applications.

Tom BrownOpenAI
Protein structure visualization representing AlphaFold's predictions
Research

AlphaFold 2: Protein Folding Solved

DeepMind's AlphaFold 2 solved the 50-year-old protein structure prediction problem, achieving accuracy comparable to experimental methods at CASP14. It could predict how proteins fold from their amino acid sequences — a problem that had stumped biologists for half a century.

John JumperDemis HassabisDeepMind

Get the latest AI milestones as they happen

Join the newsletter. No spam, just signal.