The Wikidata revolution is here: enabling structured data on Wikipedia

  • April 25, 2013
The Wikidata logo
The Wikidata logo

This post was written by Tilman Bayer, Senior Operations Analyst of the Wikimedia Foundation. It was originally posted on the Foundation’s blog here.

A year after its announcement as the first new Wikimedia project since 2006,Wikidata has now begun to serve the over 280 language versions of Wikipedia as a common source of structured data that can be used in more than 25 million articles of the free encyclopedia.

By providing Wikipedia editors with a central venue for their efforts to collect and vet such data, Wikidata leads to a higher level of consistency and quality in Wikipedia articles across the many language editions of the encyclopedia. Beyond Wikipedia, Wikidata’s universal, machine-readable knowledge database will be freely reusable by anyone, enabling numerous external applications.

“Wikidata is a powerful tool for keeping information in Wikipedia current across all language versions,” said Wikimedia Foundation Executive Director Sue Gardner. “Before Wikidata, Wikipedians needed to manually update hundreds of Wikipedia language versions every time a famous person died or a country’s leader changed. With Wikidata, such new information, entered once, can automatically appear across all Wikipedia language versions. That makes life easier for editors and makes it easier for Wikipedia to stay current.”

The Wikidata entry on Johann Sebastian Bach (as displayed in the “Reasonator” tool), containing among other data the composer’s places of birth and death, family relations, entries in various bibliographic authority control databases, a list of compositions, and public monuments depicting him

The dream of a wiki-based, collaboratively edited repository of structured data that could be reused in Wikipedia infoboxes goes back to at least 2004, when Wikimedian Erik Möller (now the deputy director of the Wikimedia Foundation) posted a detailed proposal for such a project. The following years saw work on related efforts like theSemantic MediaWiki extension, and discussions of how to implement a central data repository for Wikimedia intensified in2010 and 2011.

The development of Wikidata began in March 2012, led by Wikimedia Deutschland, the German chapter of the Wikimedia movement. Since Wikidata.org went live on 30 October 2012, a growing community of around 3,000 active contributors started building its database of ‘items’ (e.g. things, people or concepts), first by collecting topics that are already the subject of Wikipedia articles in several languages. An item’s central page on Wikidata replaces the complex web of language links that previously connected these articles about the same topic in different Wikipedia versions.

Wikidata’s collection of these items now numbers over 10 million. The community also began to enrich Wikidata’s database with factual statements about these topics (data like the mayor of a city, the ISBN of a book, the languages spoken in a country, etc.). This information has now become available for use on Wikipedia itself, and Wikipedians on many language Wikipedias have already started to add it to articles, or discuss how to make best use of it.

“It is the goal of Wikidata to collect the world’s complex knowledge in a structured manner so that anybody can benefit from it,” said Wikidata project director Denny Vrandečić. “Whether that’s readers of Wikipedia who are able to be up to date about certain facts or engineers who can use this data to create new products that improve the way we access knowledge.”

The next phase of Wikidata will allow for the automatic creation of lists and charts based on the data in Wikidata. Wikimedia Deutschland will continue to support the project with an engineering team that is dedicated to Wikidata’s second year of development and maintenance.

Wikidata is operated by the Wikimedia Foundation and its fact database is published under a Creative Commons 0 public domain dedication. Funding of Wikidata’s initial development was provided by the Allen Institute for Artificial Intelligence [AI]², the Gordon and Betty Moore Foundation and Google, Inc.

Tilman Bayer, Senior Operations Analyst, Wikimedia Foundation

Leave a Reply

Your email address will not be published. Required fields are marked *