Talkie AI: 13B Vintage Model Speaks Like 1920s Anchor

Talkie, a 13‑billion‑parameter AI trained on pre‑1930 text, mimics a 1920s news anchor but reveals post‑1930 facts, highlighting data leakage issues.

*TL;DR: Talkie, the largest AI trained only on pre‑1930 material, chats like a 1920s newsreader but mistakenly cites post‑1930 facts, exposing data contamination.

Context A team of researchers released Talkie, an AI language model built exclusively from books, newspapers and other texts published before 1930. With 13 billion parameters—the most for any “vintage” model—it adopts a Mid‑Atlantic accent and a rapid‑fire news‑anchor style reminiscent of early sound films.

Key Facts Talkie lacks a system prompt, a built‑in instruction that tells a model its limits. Without it, the model cannot recognize that its training stops at 1929, according to computer‑science professor David Duvenaud. The result is a chatbot that believes it can discuss any era. The model exhibits “temporal leakage,” a term for when an AI trained on historic data inadvertently reproduces later information. Talkie correctly identified Franklin D. Roosevelt as U.S. president from 1933 to 1937, a fact that post‑dates its training window. This leak shows that the dataset was not perfectly confined to pre‑1930 sources. Despite the flaw, Talkie can generate one‑line code snippets, though researchers say its programming ability remains rudimentary. When asked about “talking pictures,” the model called them “overrated” and predicted they would never replace silent films, reflecting the optimism of its source era. User tests produced quirky forecasts: a second world war in 1936, everyday “flying machines,” and a sun that would cease shining by 1999. These predictions echo contemporary anxieties of the 1920s rather than genuine foresight.

What It Means Talkie demonstrates both the novelty and the difficulty of building historically bounded AI. Its ability to sound authentic offers a fresh conversational experience, yet the temporal leakage undermines claims of a pure vintage perspective. The episode raises broader questions about how AI models might extrapolate beyond their training data and whether they can independently rediscover scientific breakthroughs from their era. Future work will need stricter data curation and better self‑awareness mechanisms to prevent anachronistic output.

What to watch next: Researchers plan to test tighter cutoff controls and explore whether a model limited to 1911 texts could independently articulate concepts like General Relativity.

13‑Billion‑Parameter ‘Talkie’ AI Mimics 1920s News Anchors but Leaks Future Data

More in this thread

UCSB Announces AI Undergraduate Major Amid Claims AI Detectors Are Impossible

OpenAI CTO Mira Murati Says AI Will Augment Humans, Not Replace Them, Calls Safety a Core Development Pillar

LeCun Calls for Shift from Language Models to JEPA for Next‑Gen AI

Reader notes