Using bots to change the landscape of Wikipedia

  • January 14, 2019
A robot by Banksy in New York – image by Scott Lynch CC BY-SA 2.0

This post has been written by User:TheSandDoctor, an admin on English Wikipedia. An original version of this article appeared on Medium.

A Request for Comment (RfC) is a process for requesting outside input concerning disputes, policies, guidelines or article content. As an admin on the English Wikipedia, I deal with these kind of bureaucratic issues regularly.

For a bot task to be approved on the English Wikipedia, a request, called a Bot Request For Approval (BRFA), must be filed. If there is determined to be sufficient need warranting the task, a member of the body which provides oversight on bots, the Bot Approvals Group, will generally request a trial. If the trial goes to plan, the task is usually approved within a couple of days following the trial’s completion. In the event that there are issues, those are then resolved by the submitter(s) and the reviewing member(s) are notified. This is then followed by, potentially, a new trial. In the event that things went according to plan this time around during the retrial, the task would most likely be approved shortly thereafter.

After a successful Request for Comment, I knew it was time to get to work on my next Wikipedia bot. Little did I realize at the time, that this would be the most controversial task that I had filed to date and would end up triggering an unprecedented series of events I never predicted, culminating in the rare re-opening of a Request for Comment. The change that resulted in this series of events? Moving the year an election or other referendum took place from the end to the front of the page name. For example,

United States presidential election, 2016 would become 2016 United States presidential election or Electoral fraud and violence during the Turkish general election, June 2015 would be renamed Electoral fraud and violence during the June 2015 Turkish general election, with the old titles being valid redirects as to avoid the breakage of any incoming links.

It was October 17, 2018 and the opening of the approval request started off as countless others I had filed did in the past, with routine questions being asked by a volunteer Bot Approvals Group member, in this case the user named SQL. It was at this point when there were some indications that this would not go as smoothly as I had previously experienced. It was slightly unusual when the normally quiet and routine process began to attract more attention from editors and other members of the Bot Approvals Group, who began to express concerns regarding the RfC itself. In particular, concerns were expressed that there was not enough participation within the original Request for Comment and that it was inadequately advertised at the various relevant noticeboards watched by editors who may be affected by the proposed article naming convention change. By October 20th, the unprecedented happened. The decision was made to reopen the Request for Comment, and the discussion kicked off once again, with the bot approval request taking a temporary backseat. The reopening of a Request for Comment is a fairly unusual measure that while possible, is seldom done or deemed necessary.

Following the RfC’s reopening, there was thorough discussion on both sides of the debate, which lasted an additional 31 days. On November 20th, 2018, the findings of the original close were confirmed. The consensus was that the naming convention was to be updated as proposed and, as a direct side effect, the bot task which I had submitted was given a renewed life. The upholding of the initial close, this time with clearer support, effectively cleared the way for a trial run. It was decided on the task’s discussion page that roughly 150 articles would be renamed in the trial of my task approval request. The task to move the pages to correspond with the updated naming conventions was approved on November 27th, following the successful completion of the trial and after leaving a few days holding time for any further comments or technical concerns.

From November 27th until early December, TheSandBot enacted the consensus achieved by the Request for Comment, moving (renaming) over 43,000 election related pages within a couple of days.

When a page is moved/renamed, mediawiki, the wiki software which Wikipedia uses, creates a redirect from the old title to the new one. This is done in an effort to prevent the breakage of any links to the older title. Instead of visiting the old link and receiving the equivalent of a HTTP 404 error, readers are instead merely redirected to the new location. Move operations have either two or four parts, each of which takes one edit. In the case of the former, since both parts of a move operation take one edit each (a redirect page creation and a move), two edits are performed for every ‘move’. In the latter case it is slightly more complicated, but the actions are doubled. Taking advantage of this property, I was able to save time and reduce the size of the task script. As a consequence, despite the fact that approximately 21,000 articles were moved, the logs indicate 43,000 were and registered over 86,000 edits within that time frame (see figures above/below).

From left to right: total number of edits over the account lifetime, further statistics regarding the edits made within the past year.

An example of the four edits per page move mentioned above. N signifies a page creation, m signifies a minor edit, which page moves are considered automatically by the software

With the successful completion of all the specified page moves, it is the end for that particular task. Now it is time for me to move onto different ones, like the recently approved task removing article specific templates from drafts. There is always more work to do within the largest online encyclopedia that is Wikipedia.

Find out more about TheSandDoctor’s work at thesanddoctor.com.

Leave a Reply

Your email address will not be published. Required fields are marked *