Monday, January 6, 2020

Solving Sudoku on the Computer

Solving Sudoku on the Computer

By Bobby Neal Winters

Introduction

My journey toward writing a decent computer program to solve Sudoku puzzles began with free time and a question.  The free time was Christmas Break and the question was whether I could write such a program.  This journey led me to learn some new things--some new mathematics--that I can now share with you, the gentle reader.  I hope that, having read this far, you will stick around for the story of my journey.

The truth is that I wrote two such programs.  I wrote them both in Python using the pandas library.  The initial puzzles, i.e. the clues for the puzzles, were input using Excel spreadsheets.  My reason for using Python, pandas, and Excel is an ancient one: “When all you have is a hammer, everything looks like a nail.”  However, in this case, these tools were indeed sufficient for my purposes.
This first of these two programs, which I will refer to as “the Naive Program,” was very...naive.  I will expand on it briefly in the sequel.  The second program required that I learned some new mathematics.  New to me, at least. I will share this with you at length in the sequel, but it will all be kept at an informal level.

The Structure of the Sudoku Board

I am virtually positive that any of you who have made it this far are at least as familiar with Sudoku puzzles as I am, but for the sake of exposition--and setting up notation--let me explain Sudoku puzzles.

Each Sudoku puzzle begins with a 9 by 9 grid of cells that can be partitioned three ways: 9 horizontal rows and 9 vertical columns.  We number the rows beginning at the top as R1, R2, R3,...,R9.  We number the columns beginning on the left as C1, C2, C3,...,C9.  The cell where Rr meets Cc will be denoted as RrCc.

Each of these cells RrCc is ultimately to be filled with a value v that is one of the numbers 1,2, 3,..., 9. (Here I should say that any nine distinct symbols would work, but that using another symbol set offers no advantage.)

In addition to rows and columns, there is structure of 9, disjoint, 3 by 3 squares of cells that we shall denote by B1, B2, B3,..., B9.  Note that the cell RrCc will belong to the block Bb where
b = [( r - 1) // 3] * 3 + (c -1) // 3 + 1
With // denoting integer division, i.e. 4 // 3 = 1.
To fill out a Sudoku puzzle correctly, one must obey the following requirements:
  1. Each cell RrCc must contain a value v from the set {1,2,3,..., 9}.
  2. Each row Rr must contain each value from the set {1,2,3,..., 9}.
  3. Each column Cc must contain each value from the set {1,2,3,..., 9}.
  4. Each block Bb mush contain each value from the set {1,2,3,..., 9}.
  5. A Sudoku puzzle is said to be solved when all of the requirements above are met.

The Naive Program

I designed the Naive Program in the following way.  I thought of the cells in the Sudoku puzzle as being ordered like the words on a page: Every cell in a higher row precedes every cell on a lower row; given two cells on the same row, the cell on the left precedes the cell on the right.  The algorithm I follow proceeded as below:
  1. Place the cells that do not hold an initial value in a list which is ordered as described above.
  2. Set a pointer to the first cell.
  3. In the cell pointed to by the pointer, place the smallest untried value for that cell that doesn’t cause duplicate entries for the row, column, and block.  If such a value can be found go to step 5; otherwise proceed to step 4.
  4. Forget about all values tried in this cell; decrement the pointer; go back to step 3.
  5. Put the value found in step 3 in a used this for this cell; decrement the pointer; go to step 3.
This ends positively when the pointer can no longer be incremented and negatively when it can no longer be decremented.
I applied this algorithm to a small number of test cases, one of which required 3537 seconds to find a solution.  The save you the arithmetic, this works out to about an hour.  A college of mine did the same puzzle in half an hour while talking to a colleague. Clearly it may be improved upon.

A More Informed Approach

At this point, I decided to to a bit of light research.  By “light” I mean I didn’t use any source deeper than Wikipedia.  It was there (see Wikipedia: Exact Cover) that I discovered the Exact Cover Problem, the Exact Hitting Problem, and Algorithm X (Wikipedia: Knuth’s Algorithm X)  by Donald Knuth. This provides us with an opportunity to display the value of abstraction in mathematics:  An algorithm that will solve problems about finite sets of integers and Sudoku puzzles with no transparent connection between the two.

The Exact Cover and Exact Hitting problems are stated in terms of finite sets and there is a good exposition of both and Algorithm X applied to both in the links given above.  It is shown how they can be translated into a problem about a finite bipartite graph.  The bipartite graph can then be translated into a matrix of zeros and ones (the incidence matrix of the bipartite graphs), and Algorithm X can then be applied to this matrix.

Were I teaching a class on this, I would work examples based on sets at this point, but since the sources above already does a nice job of that, let me instead proceed by explaining a way of thinking that led me to a better understanding of the problem at hand.

Activities and Requirements

The totality of a Sudoku puzzle, especially using the model we will be using, is quite a big thing.  Let’s think first about a smaller problem: Graduating from school.  In order to graduate from school, there are certain requirements we must meet.  To meet these requirements, there are certain activities we can engage in. If you want to think about this as a graph, you can list the activities on the left side of a piece of paper; list the requirements on the right; and then connect each activity to the requirements it satisfies by  line segments.  Since the requirements for a given school can still be ridiculously complicated, let’s look at a simple one below.

Wildwood Academy

Wildwood Academy is an institution of higher learning not too far from here.  At Wildwood Academy, there are six requirements that need to be met.  They are denoted as follows: Woodsmanship, Nature, Religion, Culture, Survival, and Art.  The activities that are offered to satisfy these requirements are: Noodling, Moonshine, Shape Note Singing, Bible Study, Water Witching, Indian Lore, and Square Dancing.  While there has been (and no doubt will be) endless debate as to which activity satisfies which requirement, the following is what is currently on paper:
  1. Noodling satisfies Woodsmanship, Nature, and Survival;
  2. Moonshine satisfies Nature. (This has been debated vociferously.)
  3. Shape Note Singing satisfies Religion and Art;
  4. Bible Study satisfies Religion and Culture;
  5. Water Witching satisfies Woodsmanship, Nature, and Survival;
  6. Indian Lore satisfies Woodsmanship, Nature, and Culture; and
  7. Square Dancing satisfies Art.
These are summarized in a matrix as below:
The empty cells can be thought of as having zeros in them.

Like students everywhere, the folks who attend Wildwood Academy don’t necessarily want to take more classes than they have to take in order to satisfy their requirements.  Because of this, the rule has been legislated than no requirement can be satisfied twice.  This is ridiculous and self-defeating, of course, but it is nonetheless the rule.

We will state an algorithm formally in a later section, but before we will work through an example heuristically, explaining our reasons as we go alone.  The example is in the figure below which will be explained by the text that follow the images:


Heuristically, we want to make sure the requirements are satisfied, so we first choose the requirements that have the fewest activities that satisfy them.  For example, Religion, Culture, and Art each have only two activities that satisfy them.  We choose religion because it is leftmost and for no other reason.

Religion is satisfied by Shape Note Singing and Bible Study.  We choose Shape Note Singing because it is listed before Bible Study and for no other reason.  Observe that Shape Note Singing satisfies Art in addition to Religion.  Since Art and Religion are satisfied and we don’t want anything satisfied twice we eliminate the activities that satisfy those.  We are are left with Woodsmanship, Nature, Culture, and Survival to be satisfied and with Noodling, Moonshine, Water Witching, and Indian Lore to satisfy them.

Going to the next stage, we note that Culture has a single activity that satisfies it, namely Indian Lore.  Indian Lore also satisfies Woodsmanship, and Nature. As Woodsmanship and Nature are collectively satisfied by Noodling, Moonshing, and Water Witching, we remove them all.  Having done this, all of the activities have been removed, but the Survival requirement remains.  We must back up.

Going up one level is not even as there was no choice of activities there. We must go to the first level.  There we chose between Shape Note Singing and Bible Study.  Let’s choose Bible Study on this occasion.  Doing this will drive us to choosing Square Dancing which, in turn, will drive us to choose Noodling.  

We may check this against the original matrix to verify that Noodling, Bible Study, and Square Dancing will satisfy Wildwood Academy’s requirements.

Algorithm X

It is my hope that the previous example will provide motivation for Algorithm X but, failing that, will at least provide enlightenment for someone trying to understand the algorithm.

I am not presenting Algorithm X in its greatest generality, but in a form that more transparently connects with our application.  With this caveat, let me say that the algorithm assumes that, as in our example, we are trying to satisfy each of our requirements with exactly one activity, and that we have our activities and requirements organized in an incidence matrix as in that example.

The Algorithm

  1. If your matrix has no columns, you are done as all of your requirements have been satisfied.
  2. Choose the column of your matrix that has the fewest ones.  (This assumes that no column has zero ones.)
  3. Collect all of the rows that are incident to your column and put them on a stack.  You will return to this stack if the algorithm hits a dead end. 
  4. Pop a row off of that stack created in step 3.  Push this onto a new stack called the partial solution stack.
  5. Take the newest row from the partial solution stack and determine all of the columns it is incident to. Determine all of the rows that are incident to all of these columns; this will include the original row.  Create a new matrix by dropping all of these rows and columns.
  6. If this new matrix has a column that is all zeros, pop the row off of the top of the partial solution stack and go back to step 4. Otherwise apply step 1 recursively to the new matrix.
It turns out that this matrix is easy to implement when one uses a pandas dataframe as a matrix.

Setting up the Matrix for Sudoku

We will now describe how this algorithm can be used to solve a Sudoku puzzle that the construction of an appropriate matrix.  Earlier we listed out the rules of Sudoku.  Let’s do it again thinking more specifically in terms of the context of requirements and activities, using precise symbolism.

Requirements

  1. RrCc is the requirement of having some value in the cell RrCc.
  2. Rr#v is the requirement of having the value v in the row Rr.
  3. Cc#v is the requirement of having the value v in the column Cc.
  4. Bb#v is the requirement of having the value v in the block Bb.
There are 81 each of the requirements (1) through (4) which comes to a total of 324 requirements.

Activities

What are the activities?  The sole type of activity consists of putting a particular value in a particular cell.  We will let RrCc#v denote the activity of putting the value v in the cell RrCc.  There are 81 cells with 9 values, so the comes to 729 activities.  Our matrix will by 729 by 324.  What will its entries be?

Matrix Entries in General

Note that RrCc#v satisfies RrCc, Rr#v, Cc#vk, and Bb#v, where b is calculated as in the section on the structure of the Sudoku board. Conversely, note that RrCc is satisfied by RrCc#w for any value w; Rr#v is satisfied by RrCd#v for any column Cd; Cc#v is satisfied by RsCc#v for any row Rs; and Bb#v is satisfied by RsCd#v for any s, d that will yield b according to our formula.

The Matrix for a Particular Puzzle

As was noted before, the incidence matrix for all activities and all requirements is 729 by 324.  However, each Sudoku puzzle begins with some cells that have already been filled in.  That is some activities have already satisfied some requirements.  This means we will have to drop some rows and columns from our matrix before we apply Algorithm X to it.

Let us consider the puzzle that has initial data as below:
Note that there are 29 cells that have been filled in with values; 29 activities have been completed.  Rather can calculate all of the rows and columns to be eliminated with all of these activities, let’s pick one activity that is not too special: R2C4#1.

It is easy to see at once the following requirements are satisfied: R2C4, R2#1, C4#1, and B2#1 so that the corresponding columns will have to be eliminated, but it doesn’t stop there.  Rows corresponding to the following activities:

R2C4#1, R2C4#2, R2C4#3, R2C4#4,..., R2C4#9
R2C1#1, R2C2#1,R2C3#1, R2C4#1,..., R2C9#1
R1C4#1, R2C4#1, R3C4#1, R4C4#1,..., R9C4#1
R1C4#1, R1C5#1, R1C6#1, R2C4#1, R2C5#1, R2C6#1, R3C4#1, R3C5#1, R3C6#1
will also have to be eliminated. 

Perhaps I am making a mistake in giving so much emphasis to this part of the implementation but I lots many fruitless hours in debugging my implementation of Algorithm X because I got the matrix wrong.

Once the initial matrix is modified with respect to the initial data, Algorithm X works remarkably well in producing a solution.  For example, the puzzle that took 3537 seconds to solve with the Naive program was solved in just over a second by Algorithm X.

Concluding Remarks and Future Directions

While I am incredibly pleased by improving the execution time by a factor of 3000, I am still impressed that a human being could solve the puzzle twice as fast as a computer operating naively.  Those of you who have played Sudoku seriously--and some of you are very serious--know some shortcuts.  One question is whether these could be worked into an algorithm. Another question that has captured my interest, given my interest in Python computing, is whether machine learning (or artificial intelligence or whatever you call it) in the form of a neural network can be taught to solve Sudoku well.  I am putting this on my list for possible future activities.