There is no 16-Clue Sudoku
A recent paper by Gary McGuire, Bastian Tugemann, Gilles Civario "
There is no 16-Clue Sudoku: Solving the Sudoku Minimum Number of Clues Problem" (PDF) shows a new and interesting approach to the problem of the minimum numbers of clues to a normal Sudoku. This has been highlighted in a number of news outlets and scientifc journals (eg
Nature). (Note: Nature's article tries to reproduce the 17 Clue example from the paper but they get the 43 in the wrong place, giving an invalid puzzle with 9734 solutions. The
Sudoku in the paper is correct).
It has been conjectured for some time that 17 clues is the minimum number of necessary clues to make a single solution Sudoku puzzle. There are about 50,000 such puzzles collected from various sources out and about on the Internet. One such set I use for calibrating and testing.
'Proof' though needs to be qualified. This is not a mathematical proof as such but a brute force computer search through the number space within the Sudoku set, and the author admits that a mathematical proof is still be discovered. What is interesting about the paper is the algorithm to search the space. To search through all possible combinations would take an impractical amount of time. The author identifies the
Unavoidable Sets as the key to reducing the search space. These are sets of four cells which potentially could be interchanged to make two solutions - and therefore, minimally, one of those cells must be a clue.
Even with this insight, it is still a challenging algorithm to run. First you must obtain (or generate) all possible unique filled-in Sudoku boards, of which there are 5,472,730,538. The algorithm must also take into account higher order Avoidable Sets (with nine numbers instead of 4). So it is understandable that the computing time was still considerable.
Nevertheless, to get a result in a practical time is an achievement. I have been running some tests on my Solution Count program which shows how the time to check a puzzle using a brute force method becomes exponential with the reduction in the number of clues. Anyone familiar with the solver will know this feature. 17 clue puzzles take on average 6 seconds - although the exact orientation and placement of numbers can make this vary from 0.5 seconds to 30 seconds. Given 22 clues the average time is 0.037 seconds and becomes millisecond or less with 25 clues or more. So a practical search of this space without using a trick like Avoidable Sets is impossible.
Some comments on the Nature page (comments down at the time of writing) assert that the number of clues determines the grade. This is not true - if you using logical strategies for grades as I do. It is possible to have very easy 17 clue puzzles and 'extreme' 30 clue ones. The only effect of clue density it to increase the number of operations required to solve (which is one mildly additive heuristic in my grading).
Overall, an exciting paper.
Andrew Stuart
[EDIT: To quote an answer on
StackOverflow "As of today [2018], 49,157 17-clue non equivalent puzzles have been found. Studies are in progress to list all of them" and in 2021 an additional one has been found. This is more complicated than it sounds as the bigger problem is filtering "non equivalent" puzzles.
Comments
... by: Rich sell
Mind you, any mathematic solution for a geometric arrangement is not a universal sudoku solver, but only a solution for a pre-designed, special set of PERFECT sudokus.
... by: Chuck
... by: Martin Harvey
You mention a per solution time of six seconds, I'm assuming this is brute force search of the space, taking the "least branches first" approach to filling the solution tree. I suspect that using the DLX algorithm to solve the "exact cover" problem (which explicitly represents all possible combinations of constraints and dependencies between them), you should be able to get a reasonable solution in milliseconds, particularly if the implementation takes into account the capabilities of modern hardware. I'm just coding this up in the next couple of days I'll get back to you if you want.
If the DLX algorithm does solve puzzles in milliseconds, then I *think* enumerating all possible 17 clue sudoku puzzles (after removing symmetries / permutations) will be amenable to a distributed approach (a la "distributed.net", "folding@home", "BOINC", "roasetta@home" etc etc) : A few thousand volunteers should give you about 150 million CPU hours per year) - I suspect that would make the problem eminently solveable.
http://www.martincharvey.net/
MH.
... by: James Havard
(My estimate of the possible sudoku puzzles is based on Andrew Stuart's info about unique boards.)
Thanks,
James Havard
... by: James Havard
I may be walking on thin ice here but wouldn't each of these uinque boards,UB, have 81 80 clue puzzles?
CLUES Puzzles
80 81x UB = 443,291,173,578
79 3240xUB=1.77E13
78 1,663,740xUB=9.105E15
77 25,621,596xUB=1.4E17
About 100 times more puzzles for each drop in the number of clues, but it would get more complicated with fewer and fewer clues and the number of puzzles would decline. But still adding all the possible puzzles of different clues together would be quite a lot of puzzles.
... by: ibrahim
...but
if you take into account all the symmetries the layout enjoys (rotation, reflection, swapping 1 for 2 etc), the number of uniquely different filled sudoku boards is
5,472,730,538
... by: bartlm
Where I would take issue with your comment on the 16 clue paper is that it is not a mathematical proof. An exhaustive search may not be elegant but if the search results are incontrovertible and anyone can check them, then the conjecture has been proven. I seem to remember that the four colour theorem was proved in this way about 25 years ago.
Have a great Christmas