Wikipedia, the world’s largest encyclopedia, is painstakingly human. Contributions to its pages are labored over and debated by hundreds of thousands of active volunteer editors—some so dedicated they compete to update celebrity deaths first. But in recent years the site has also emerged as a backbone of A.I., with its data increasingly relied upon by the technology’s large language models (LLMs) like OpenAI’s ChatGPT.
In the early days of LLMs, tech companies told Wikipedia they only needed to train their models on its data once a year. “That’s how it started,” Lane Becker, senior director of earned revenue at the Wikimedia Foundation, told Observer. “Now, it is rapacious,” he added. As Wikipedia becomes more central to the infrastructure A.I., the organization is grappling with rising bot traffic, the need for attribution and how to sustain its ecosystem in the face of powerful new users.
Wikipedia is one of the ten most-visited websites in the world. It doesn’t run ads and offers its content for free, relying instead on donations from readers. The nonprofit Wikimedia Foundation oversees the site but takes a mostly hands-off approach to its rules and content moderation. Wikipedia’s vast, structured troves of information are almost tailor-made for A.I. As of July 2023, every single major LLM had been trained on Wikipedia, according to the Foundation. While the team initially worried that A.I.-generated answers might reduce traffic to the site, Becker says “we haven’t seen any significant decline related to A.I. traffic.”
Attributions, however, remain a sticking point. Citations not only give credit but also help Wikipedia attract new editors and donors. ” If our content is getting sucked into an LLM without attribution or links, that’s a real problem for us in the short term,” said Becker. “In the medium and long term, it’s a real problem for these A.I. systems because they need us to keep creating this content.” He added that bot traffic pulling Wikipedia data is at an all-time high. “We are clearly entering an era where the level of automated traffic to our service is rising—and it isn’t going to stop.”
Becker hopes A.I. firms will eventually support Wikipedia’s survival with funding and policy commitments. “How do we continue to support an open, free content and knowledge ecosystem while also recognizing some of these companies really want a lot out of us ”
Coming up against powerful critics
LLMs aren’t Wikipedia’s only challenge. In recent months, the site has come under attack by prominent conservative voices in the U.S.
Elon Musk—once a fan—has since become a vocal critic, citing frustrations with how Wikipedia describes his role at Tesla (TSLA) and its account of a controversial salute he made in January. Earlier this year, Musk told his followers on X to “defund” Wikipedia, accusing it of left-leaning bias.
Such criticism is nothing new. “We’re used to this type of thing around the world, frankly,” said Rebecca MacKinnon, vice president of global advocacy at the Wikimedia Foundation. But in the A.I. era, she notes, Wikipedia’s content matters more than ever. “I could come up with a whole list of countries where we’ve heard from powerful people who don’t like their Wikipedia page,” she said. “But their Wikipedia page matters a lot more than it used to. The stakes have gotten higher for everyone.”
That’s one reason the Foundation is pushing harder for protections—both for the platform and its editors, many of whom use anonymous usernames to avoid harassment. “Protecting their privacy is a critical element of the work that we do,” Becker said.
Founded in 2001, Wikipedia has weathered its share of crises. “Every year there’s something, right?” said MacKinnon, pointing to China’s 2019 ban of the platform as just one of many examples. “The community is always thinking and debating about new things that come up and how to adapt,” she added. “We’ve just got to keep on trucking.”