Solving Listener Crossword 4151 with Mathematica

12 October 2011

MathsJam is a monthly meeting for people interested in recreational mathematics, and drinking of beer. At the Edinburgh meeting in August, we were presented with this Listener Crossword, from The Times Crossword Club. At the time of writing, there are plenty of solutions on the web from those who tackled the puzzle in the traditional manner of brain, pen and paper. I opted to cheat.

The description of the puzzle itself is quite complex, but in a nutshell the task is to substitute each of the given letters for one of the first ten prime numbers ( with no duplication ). The expression for each clue will then give a numerical solution. Substitute each digit in the solution with the first letter of its name and write the result in the crossword. There are two ways of substituting digits for letters, either in English or in German, but a solution to a given clue must be consistently one language or the other.

An Algorithmic Solution

I have decided to use Mathematica due to its built in symbolic evaluation capabilities, but in principle any sufficiently powerful programming language would do. If you wish you can download a copy of the mathematica notebook.

The goal of the algorithm will be to find two sets of rules that map the first 10 primes to the given letters, which give valid solutions. One straightforward way to achieve this is to enumerate every single possibility until we find two such sets. So how many possibilities are there?

There are \( 10! = 3,628,800 \) possible sets of rules to examine. However, for each of the 27 clues, the solution generated by each rule set can be one of two options: English or German. That means that the total number of candidate solutions is

\[ 10!\times 2^{27}=487,049,291,366,400 \]

This is what’s called the ‘solution space’, and at a rate of 1000 candidates per second, it would take over 15,000 years to examine them all. Clearly this kind of brute force enumeration is impractical.

However, most of these solutions will be invalid in some way. In fact, if we trust the crossword compilers, all but two of them will be invalid. Some rules will produce solutions of the wrong size, others may conflict with other solutions in the crossword, and some may be invalid from a mathematical point of view ( such as a division that does not produce a whole number ). If instead of trying to solve the puzzle in one go, we construct ‘partial’ solutions in some systematic fashion, we could avoid exploring parts of the solution space that cannot possibly lead to a correct solution. This is the principle behind the backtracking search, one of the cornerstones of artificial intelligence.

The notion of a partial solution comes quite easily when considering a crossword puzzle, as it is simply a puzzle that is incomplete - ie, the solutions to at least one clue has been filled in. As an example, consider clue 19 Down : \( A^2 \).

There are ten ways of mapping the single letter in 19 down to each prime

In[1]  := rules = {{a -> 2}, {a -> 3}, {a -> 5}, {a -> 7}, {a -> 11}, {a -> 13}, {a -> 
   17}, {a -> 19}, {a -> 23}, {a -> 29}}

In[2]  := a^2/.rules
Out[2] = {4, 9, 25, 49, 121, 169, 289, 361, 529, 841}

Since we know 19 Down must be three letters long, immediately we can discard the first four candidates. The remainder are all possibilities, since at this point we have no more information. Lets take the first matching possibility, \( A^2=121 \) given by the rule \( A=11 \).

In English, this gives

In[3]  := IntegerDigits[121] /. {1 -> 'O', 2 -> 'T'}
Out[3] = {'O', 'T', 'O'}

If we now look at 21 Across - \( ABI \) - There are 2 symbols to map to the remaining 9 primes ( “A” has already been mapped to 11 by 19 Down ). The number of ordered k-subsets of n is given by,

\[ _nP^k = \frac{n!}{(n-k)!} \]

So we have \( 9!/7!=90 \) new sets of rules to examine

In[4] := rules = {a -> 11, b -> #[[1]], i -> #[[2]]} & /@ 
 Permutations[{2, 3, 5, 7, 13, 17, 19, 23, 29}, {2}] 
Out[4] = {{a -> 11, b -> 2, i -> 3}, {a -> 11, b -> 2, i -> 5},...

In[5] := a b i/. rules
Out[5] = {66, 110, 154, 286, 374, 418, 506, 638, 66,...

Again, many of the partial solutions produced by these rules are invalid. Only 36 are the required 4 digits long, and we also now have information about the form of this solution. We know that the first letter is “T”, given to us by the puzzle setter, and also that the last letter is “T”, because of our solution to 19 Down : “OTO”.

In[6] := candidates = Select[a b i /. rules, Length[IntegerDigits[#]] == 4 &]
Out[6] = {1045, 1265, 1595, 1001, 1309, 1463, 1771, 2233,...

In[7] := englishRules = {0 -> 'Z', 1 -> 'O', 2 -> 'T', 3 -> 'T',...
In[8] := germanRules = {0 -> 'N', 1 -> 'E', 2 -> 'Z', 3 -> 'D',...
In[9] := candidates = StringJoin @@@ ((IntegerDigits /@ candidates)
   /. Join[english, german])
Out[9] = {'OZFF', 'OTSF', 'OFNF', 'OZZO', 'OTZN', 'OFST',...

In[10] := Select[candidates, StringMatchQ[#, RegularExpression['T..T']] &]
Out[10] = {'TTTT', 'TFFT', 'TTTT', 'TFFT'}

Note the duplicates. This is because \( B I = I B \), but we cannot eliminate them because the rules used to produce them are different, and so will lead to different partial solutions. By applying the constraints systematically, we have reduced the number of potential solutions for 21 Across from 90 to just 4.

We then repeat this process in a recursive manner, using one of the valid solutions to 21 across, and the rules that were used to generate it, to examine the next clue. We continue in this fashion until either the crossword is filled, or we reach a dead end ( which is more likely ). If we reach a dead end, we backtrack ( hence the name ) to one of the possibilities we haven’t tried yet.

Why is this approach any more efficient than the brute force enumeration described above? The key is the generation of partial solutions in a tree like fashion, and the ability to eliminate them when we know that they could not possibly form part of a full solution. When we eliminate such a partial solution because it didn’t satisfy the constraints, we eliminate all the possible solutions leading from that branch. For example, when we eliminated the rule \( A=2 \) because it produced a solution to 19 Down that was too short, it eliminates not just one solution but all \( 9!\times 2^{27}=48,704,929,136,640 \) solutions incorporating that rule. Not bad for a single calculation.

The Solution

In abbreviated form, the algorithm looks something like this.

ExamineNextCandidate[{solved_, unsolved_}, 
  rules_] := {solved, rules} /; Length[unsolved] == 0

ExamineNextCandidate[{solved_, unsolved_}, rules_] := Module[{...},
   {clueIdx, cluePattern} = First[unsolved];
  nextRuleSet = 
   GenerateRules[GetSymbolsInClue[ClueExpression[clueIdx]], rules];
  solution = Null;
  While[Length[nextRuleSet] > 0 && solution == Null,
   nextRule = First[nextRuleSet];
   nextRuleSet = Rest[nextRuleSet];
   candidates = 
    StringJoin @@@ (IntegerDigits[
        ClueExpression[clueIdx] /. nextRule] /. {english, german});
   candidates = FilterCandidatesByConstraints[candidates];
   If[Length[candidates] > 0,
     nextPatterns = 
      GenerateNewPatterns[unsolved, {#, clueIdx}] & /@ candidates;
     nextArgs = {Append[solved, First[#]], Rest[#]} & /@ nextPatterns;
     While[solution == Null && Length[nextArgs] > 0,
      solution = ExamineNextCandidate[First[nextArgs], nextRule];
      nextArgs = Rest[nextArgs]

Note that the function ExamineNextCandidate calls itself until the pattern on the list of unsolved solutions is met, ie the list is empty. We call this function using an initially empty rules list, and the patterns provided by the puzzle setter.

In[1]:=ExamineNextCandidate[{{}, {...,{20, '..'}, {21, 'T...'}, {23, '....'},...}, {}]
Out[1]={...,{20, 'FE'}, {21, 'TTTT'}, {23, 'FZFO'},...}

On my dual core netbook ( A 2010 Alienware M11xR2 ), this took just over 2 seconds to find a solution.

For the second puzzle, we alter the pattern to specify that the first letter of 21 Across ( and last of 9 Down ) can be any letter except ‘T’, using the regular expression "[^T]...". In the code snippets, I have abbreviated the output somewhat. See the attached Mathematica notebook file for full details.

In[2]:=ExamineNextCandidate[{{}, {...,{20, '..'}, {21, '[^T]...'}, {23, '....'},...}, {}]
Out[2]={...,{20, 'EE'}, {21, 'FFTT'}, {23, 'FZFO'},...}

Finally, the last part of the puzzle is to complete the following task:

"Solvers must identify the English transcriptions of the answers entered using German in one of the grids, which, in clue order, form two words of equal length that are to be entered below the grids, one on each side of the colon; this will relate to a claim made above."

To identify the answers entered in German, we apply the rules returned by the solver to the original expressions and use the German digit replacement rules. We then compare these to the completed grid.

In[1]:={solution,rules}=ExamineNextCandidate[{{}, {...,{20, '..'}, {21, 'T...'}, {23, '....'},...}, {}]
Out[1]={...,{a -> 11, b -> 17, d -> 19, e -> 7,...}}

In[2]:=Intersection[(clues /. rules1) /. {a_, b_} :> {a, 
    IntegerDigits[b] /. german}, solution][[;;,1]]
Out[2]={10, 11, 13, 16, 25, 29, 30}

In[3]:=(Select[clues, MemberQ[%2, First[#]] &] /. rules) /. {a_, b_} :> {a,
    StringJoin @@ (IntegerDigits[b] /. english)}
Out[3]={{10, 'SF'}, {11, 'NF'}, {13, 'FOO'}, {16, 'TNOTE'}, {25, 'NON'}, {29,
   'SENSE'}, {30, 'SSF'}}

Having identified those clues, we reevaluate the expressions this time using the English rules. There are 4 clues that are the same in both German and English, but it is pretty clear that the required words are “Footnote” and “Nonsense


Code on Github