Neural crossword solver outperforms humans for the first time

Crossword puzzles are one of mankind’s most popular pastimes. They consist of a set of answer clues that form an overlapping word grid. The goal is to complete the grid.

Crossword puzzles are an area of ​​activity where humans significantly outperform machines. Indeed, the most prestigious competition is the American Crossword Puzzle Tournament, which is held every year and has always been won by a human.

In 2017, a machine called Dr Fill earned a respectable 11th place out of over six hundred competitors. But a machine had never won the American crossword tournament. Until last year.

Enter the Berkeley Crossword Solver, an automatic crossword solving program developed by Eric Wallace and other members of the natural language programming team at the University of California, Berkeley, with Matt Ginsberg, the original developer of the Dr. Fill program.

First place

The new program outperformed all human competitors in the 2021 US Crossword Tournament.”[This] This is the first time that a computer program has exceeded human performance in this event,” the researchers say. “Our system won first place against the top 1,100 human solvers. »

Machines find crossword puzzles difficult because they consist of several distinct types of clues that require different approaches to solve. For example, knowledge-based clues require an understanding of history or pop culture or other forms of trivia. The team gives this as an example: CLUE: Book after Song of Songs A: ISAIAH.

A pun clue forces players to reason about anagrams, puns, or words with similar meanings. For example, HINT: A followed by nothing, A: TEN.

And common sense clues require a realistic understanding of the real world. For example, HINT: cause of a stain, A: WETink.

Answers made up of more than one word are particularly tricky for programs because the number of possible answers increases considerably.

In contrast, word definitions are often simple to find automatically by a machine through data mining. For example, HINT: Inhabitant of tusk savannah A: WARTHOG.

Berkeley Crossword Solver’s approach is simple. It starts with a neural questioning response algorithm that produces candidate solutions for all clues along with a score suggesting the quality of the solution.

This algorithm is trained on a large database of over six million question-answer pairs from historical crossword puzzles dating back 70 years. It also uses an open source natural language AI called GPT-2 to help with word segmentation.

Then it uses these suggested answers to populate the grid. Without a perfect set of answers, this process is tricky because of the many conflicts that arise, requiring one candidate solution to be favored over another.

The Berkeley team uses a process called belief propagation to solve this problem. This aims to produce a solution with the highest overlap with the expected solution, rather than a solution that prefers words with higher scores from the question answering process.

The result is a solution that is often close to the correct one but with various small errors. So the last step is a second pass to the puzzle where the program examines alternative solutions that are within one small modification. The program notes the resulting solutions and repeats itself until there are no more possible improvements.

Competent solver

The result is a high performance crossword solving program. “Our system outperforms even the best human solvers and can solve puzzles in a wide range of areas with pinpoint accuracy,” the team says.

However, they would like to point out that the crossword is not “solved” yet. Instead, the Berkeley Crossword Solver is optimized for certain types of puzzles popular in the United States, such as the famous New York Times crossword. “Compared to existing approaches, our system improves exact puzzle accuracy from 57% to 82% on crosswords of The New York Times», explains the team.

But it does not work well on other types. In particular, the program cannot tackle cryptic crosswords of the style popular in the UK. “Cryptic crosswords involve a different set of conventions and challenges, for example, more metalinguistic reasoning cues such as anagrams, and likely require different methods than what we propose,” admits the Berkeley team.

This is interesting work showing significant advances over previous solvers. It also demonstrates that humans still dominate the world of crossword solving; for how long is hard to say.


Ref: Automated Crossword Solver: arxiv.org/abs/2205.09665