Article image Logo

Pictures That Lie

The slide that got history wrong

A teacher training deck asked ChatGPT to “create a multimedia presentation on the Mexican Revolution.” The slide dutifully appeared with a caption—“This image highlights significant figures and moments”—over a glossy collage in which no one resembled Pancho Villa, Emiliano Zapata, Francisco Madero, or Porfirio Díaz. Common Sense Media’s Robbie Torney later described the example as “sophisticated fiction, not factual representations,” and both OpenAI and Common Sense stated that the training would be revised. It’s a small scene from Bloomberg Businessweek’s reporting on how AI is already seeping into K-12 practice—but it’s the perfect parable for why text-to-image tools and classrooms don’t mix without serious guardrails.

When generators guess at the past

Text-to-image systems are astonishing at aesthetic mimicry. Ask for “a storm over the Zócalo in oil-painted chiaroscuro” and you’ll get an image that looks like a lost Baroque canvas. Ask for “Zapata confers with Villa in 1914” and you’ll get handsome strangers in period costume. Under the hood, diffusion models do not recall photographs; they synthesize pixels to satisfy a statistical description of your words. When the target is history—specific people, uniforms, insignia, places, dates—the model does what it always does: it guesses. Sometimes those guesses are beautiful. Often, they’re wrong. Research over the past two years has cataloged these failure modes: object “hallucinations,” miscounted crowds, scrambled spatial relationships, and the persistent inability to associate the correct attributes with the correct objects. In other words, the generator can draw a moustache and a sombrero—but not the moustache, on the right face, in the right year.

Why the Mexican Revolution Prompt was doomed from the start

Even if you describe Villa or Zapata perfectly, mainstream tools impose safety and policy constraints that quietly derail accuracy. OpenAI’s DALL·E 3, for example, includes mitigations that decline requests for public figures by name. That’s sensible for privacy and abuse prevention, but disastrous for historical fidelity: you asked for Zapata; the model gave you “generic revolutionary.” Add to that the messiness of training data—billions of scraped image-text pairs with inconsistent captions—and you get composites that look plausible while drifting from the truth. Stable Diffusion’s own documentation points to LAION-5B as a core source; the set is vast and useful, but its captions are noisy, aesthetically filtered, and not curated for pedagogy or historical accuracy. In aggregate, that is a recipe for “museum-ish” imagery with the wrong people in the frame.

The evidence isn’t hypothetical

When Google’s Gemini rolled out image generation for people, it soon had to pause the feature after producing historically inaccurate depictions—diverse Vikings, reinvented popes, and anachronistic soldiers—because the model overcorrected for representation in contexts where accuracy mattered most. Google apologized and turned the feature off until it could improve. That wasn’t just an internet culture-war flare-up; it was a public demonstration of a technical reality: these systems are pattern painters, not historians.

Why pictures that lie are worse than paragraphs that waffle

Educators know that images leave deep grooves in memory. The “picture superiority effect” refers to the fact that students tend to remember pictures better than words; repetition of images also amplifies the illusory truth effect. If the slide is wrong, the correction later has to fight a stickier, more vivid memory—especially dangerous when the subject is identity, culture, or the contested story of a nation. In a classroom, a persuasive fake portrait isn’t harmless flair; it’s a misfiled fact that keeps resurfacing.

What diffusion actually does—and why it trips on facts

Diffusion models start with noise and iteratively denoise toward an image that best matches your text embedding—a vector representation learned from vast text-image pairs. Two things go sideways for education. First, composition: the model often fails at counting (“three rifles,” “five banners”), spatial relations (“Díaz to the left of Madero”), and attribute binding (“black sash on Zapata, not Villa”). These are well-documented research failures that directly map to history and science prompts. Second, grounding: the model lacks a canonical registry of “what Villa’s face is” because it does not retrieve a source; instead, it averages across look-alikes in noisy data while a safety layer strips away proper names. The outcome is photorealistic fiction with the confidence of a textbook plate.

Schools are racing ahead; the guardrails are chasing

U.S. schools are rapidly experimenting with AI, and official guidance has been working to keep pace. The Department of Education’s 2023 report cautioned that AI should augment—not replace—human judgment, and Common Sense Media’s K-12 brief advises using generative tools for creative exploration rather than as an oracle for facts. UNESCO’s own guidance echoes this caution. The Mexican Revolution slide is exactly the kind of “looks right, is wrong” output those documents warn about.

Better than pretending: how to teach with images in the AI era

There is a responsible path, but it starts by acknowledging what these models are bad at. If an assignment requires historical fidelity—such as people, uniforms, insignia, and street scenes tied to specific dates—students should work from primary sources, licensed archives, or educator-vetted image sets. If AI imagery is used, it must be framed as an illustration, labeled as synthetic, and paired with citations to the real thing. Provenance tech like C2PA Content Credentials can help students and teachers track what is AI-generated at a glance, and research on retrieval-augmented image generation shows promise for grounding rare or specific entities—but those are emerging practices, not defaults in classroom tools today.

The real lesson of that bad slide

The problem isn’t that AI can’t make pictures; it’s that schools are letting generative models author the past. In the rush to appear modern, a training module asked a probability machine to invent the faces of revolutionaries—and then presented the invention as instruction. The fix isn’t another prompt. It’s pedagogy. Until AI image tools are grounded by design and classrooms are fluent in provenance, the safest policy is simple: if accuracy matters, don’t outsource the picture. And if you do use AI art, teach it as art—never as evidence.


©2025 Copyright by Markus Brinsa | Chatbots Behaving Badly™

Sources

  1. Bloomberg Businessweek (Vauhini Vara), “AI and Chatbots Are Already Reshaping US Classrooms,” Sept. 1, 2025 bloomberg.com
  2. Apple News+ Narrated listing for “How chatbots and AI are already transforming kids’ classrooms” podcasts.apple.com
  3. OpenAI — DALL·E 3 announcement (public-figure mitigation and safety) openai.com
  4. Stability AI — Stable Diffusion launch (training on LAION-5B aesthetics subset) stability.ai
  5. Stability AI — Stable Diffusion v2 release stability.ai
  6. LAION — LAION-5B dataset description laion.ai
  7. Google — “Gemini image generation got it wrong. We’ll do better.” (pause for people images) blog.google
  8. Reuters — Google pauses Gemini people-image generation after inaccuracies reuters.com
  9. AP — Google says Gemini sometimes “overcompensated” for diversity (historical inaccuracies) apnews.com
  10. U.S. Department of Education (2023), “Artificial Intelligence and the Future of Teaching and Learning” (PDF) ed.gov
  11. Common Sense Media (Aug. 28, 2024), “Generative AI in K-12 Education” white paper (PDF) commonsensemedia.org
  12. UNESCO (2023), “Guidance for Generative AI in Education and Research” unesco.org
  13. Zhuang et al. (NeurIPS 2024), “We Never Know How Text-to-Image Diffusion Models Work (but can we tweak them anyway?)” (PDF) proceedings.neurips.cc
  14. Binyamin et al. (CVPR 2025), “Make It Count: Text-to-Image Generation with an Accurate Number of Objects” (PDF) openaccess.thecvf.com
  15. Han et al. (2024), “Object-Attribute Binding in Text-to-Image Generation” (arXiv) arxiv.org
  16. Ding et al. (CCS 2024), “Understanding Implosion in Text-to-Image Generative Models” (PDF) people.cs.uchicago.edu
  17. Lim et al. (2024), “Factual Text-to-Image Generation” (PDF; retrieval-augmented T2I survey) arxiv.org
  18. Chen et al. (2022), “Re-Imagen: Retrieval-Augmented Text-to-Image Generator” (arXiv) arxiv.org
  19. Nielsen Norman Group (2024), “The Picture-Superiority Effect” nngroup.com
  20. Ecker et al. (2022), “The psychological drivers of misinformation belief…” (Nature Reviews Psychology) nature.com
  21. C2PA — Content Credentials standard c2pa.org
  22. OpenAI — “Understanding the source of what we see and hear online” (provenance efforts) openai.com

About the Author