Language Model
5 milestones in AI history
GPT-1: Generative Pre-training
OpenAI released GPT-1, demonstrating that a Transformer trained on vast amounts of text using unsupervised pre-training could then be fine-tuned for specific NLP tasks. With 117 million parameters, it showed the potential of scaling language models.
GPT-2: 'Too Dangerous to Release'
OpenAI announced GPT-2 (1.5 billion parameters) but initially refused to release the full model, calling it 'too dangerous' due to its ability to generate convincing fake text. The decision was controversial — some praised the caution, others called it a publicity stunt. The full model was eventually released in November 2019.
GPT-3: The 175 Billion Parameter Leap
OpenAI released GPT-3 with 175 billion parameters — 100x larger than GPT-2. Without any fine-tuning, GPT-3 could write essays, code, poetry, translate languages, and answer questions through 'few-shot learning' (learning from just a few examples in the prompt). The API launched in beta, enabling thousands of applications.
Llama 2: Meta Opens the Floodgates
Meta released Llama 2, a family of widely available large language models (7B, 13B, 70B parameters) distributed as open weights under a custom license that allowed broad commercial use. While not open-source in the strict OSI sense, it gave companies and researchers access to a frontier-quality model they could run, customize, and deploy themselves.
Llama 3: Open-Source Catches Up
Meta released Llama 3 (8B and 70B, later 405B), closing the gap with closed frontier models. The 405B release put near-frontier open-weight models into more developers' hands, even though Meta's licensing still sat outside a strict open-source definition.