A Q&A With Phrase Detectives Developers
Updated: Feb 27
Phrase Detectives is a game on Lingo Boingo that collects data on how humans understand anaphora, which is the act of referring to information provided elsewhere in a text. This blog post features a Q&A with two of the Phrase Detectives developers Massimo Poesio and Udo Kruschwitz.
What inspired the creation of Phrase Detectives and what was the development
Phrase Detectives was born from our research on anaphora. We started to realize that our subjects disagreed quite a lot on the interpretation of anaphoric expressions, particularly in conversations. After running a number of traditional intercoder agreement studies with large number of subjects, we started to think about ways to collect such data on a larger scale but in a cost-effective way, so we thought about crowdsourcing. This was 2006, just one year after the launch of Amazon Mechanical Turk, which was not yet well-known in the NLP community, and the same year in which the term 'crowdsourcing' was invented. Our inspirations were Wikipedia (from which the name of the project, 'AnaWiki'), the work by Push Singh and colleagues on Open Mind Commonsense (Singh, 2022) which would result in ConceptNet and the Semantic Media Wiki project. However, when Jon Chamberlain (another developer of Phrase Detectives) started working on the project in November 2007, he quickly became aware of Luis von Ahn's work on crowdsourcing, and utilized this method. Within a few weeks we had the first design for Phrase Detectives, whose first version was ready in January 2008.
I should also mention that especially at the beginning the game was very much a collaboration with colleagues working on XML infrastructure for NLP from the University of Bielefeld in the SEKIMO project - Nils Doweled, Daniela Goecke, Daniel Jettka, and Maik Stührenberg, who developed the SGF databases we still use in Phrase Detective. Daniela & co had developed a very nice web interface for annotation called Serengeti; our idea was to develop different platforms for different type of web collaborators - Serengeti would be the interface for linguists, Phrase Detectives for players.
What was the initial participant response to the game?
Slow but steady is how I recall it. I see from my email records that we started user testing Phrase Detectives intensively in the Spring of 2008, and then we did the first proper advertising through the local press in September 2008, at about the same time as our first paper appeared, Chamberlain et al 2008. Our advertising included handing out postcard-size flyers of which we distributed at LREC 2008.
I have been keeping semi-regular records of the number of players and annotations; the first such record I have is from 6th January 2009, when we only had 189 players and 14,000 judgments. However in 2009 we started proper advertising - Jon did an interview on BBC Essex, and Mark Liberman mentioned Phrase Detectives on his blog - this brought us almost 500 new players. So by end of March 2009 we had about 700 players and 140,000 judgments; by end of June we reached 1,000 players and 434,000 judgments. We completed our first 50 documents by end of July, and after 1 year from the proper start of advertising, on 6th of January 2010, we had over 1,500 players, and 164 completed documents.
What specific research did the data from Phrase Detectives contribute to?
The original motivation for Phrase Detectives was to collect data about disagreement on anaphoric interpretation, and this goal was achieved. The Phrase Detectives Corpus, the third release of which is about to come out, is the larger corpus of data about disagreement for any phenomenon, and was the main resource used in the DALI project, 'Disagreements in Language Interpretation', for work on how to train and evaluate NLP models from data containing disagreements. We are also in the process of writing papers on ambiguity in anaphora from a linguistic perspective, although this work is slower.
How can participant data from Phrase Detectives help improve language technology?
Data from Phrase Detectives opened up the possibility of testing and evaluating methods for resolving anaphora in two domains that in the past were very understudied: fiction and encyclopedic data. Additionally, participant data has made linguistic communities working on developing language technology more aware of the inherent ambiguities underlying certain language phenomena.
Special thanks to Massimo Poesio and Udo Kruschwitz for their time and contribution!