Solving NetHack's Mastermind
Note: This article was originally written in 2006. It is published here for posterity.
NetHack's Mastermind is very similar to the real Mastermind, with three changes:
- Instead of five colors, there are seven notes (A, B, C, D, E, F, and G).
- There are five positions, not four.
- You can give partial solutions (such as AB even though the solution is five notes long).
A gear represents a note in the correct position. A tumbler represents a note in an incorrect position. So, for example, if you're trying to find the tune FFAAB and you guess AFAFB, you'll get three gears and two tumblers. If you guess CA you'll get one tumbler. If you guess BBBBB you'll get one gear. If you guess DDDDD you'll get silence (in other words, no gears and no tumblers).
This document describes my attempts at optimizing a NetHack Mastermind solver (with help from Sean Kelly and papers from Don Knuth and Barteld Kooi). The first, naïve algorithm finds the solution on average within 15 turns, possibly taking up to 21. The last, most effective algorithm finds the solution on average within 4 turns, possibly taking up to only 6. This is an important result because in NetHack, the level where you play Mastermind is one of the most dangerous, so you have a strong incentive (your own survival) to find the solution as quickly as possible.
Algorithm: Naïve 1
The first tune we always test is A. If that's a gear, then we're done with the first position. We don't know if any more As are in the tune, so our next stab is AA. Back to the first play though: if A produces a tumbler, then we skip it for now and move on to B. If we get silence, then we know A is not in the tune at all, so we remove it from any further consideration. Repeat this until we have found the tune.
This basic algorithm is straightforward and works well enough. This is roughly what I do when I am playing NetHack, since it can be done without assistance.
So let's use this algorithm to try to find a particular tune:
- A: one tumbler (we know there's an A but not in the first position, so skip A for a while)
- B: one tumbler (ditto. skip B for a while)
- C: one gear (great! first position is C. Try C again)
- CC: one gear (now we know there are no more Cs in this tune)
- CD: two gears (second position is D. try D again)
- CDD: two gears (now we know there are no more Ds in this tune)
- CDE: two gears (now we know there are no Es in this tune)
- CDF: three gears (third position is F. try F again)
- CDFF: three gears (so we know there are no more Fs in this tune)
- CDFG: three gears (so we know there are no Gs in this tune)
- CDFA: three gears, one tumbler (we know there's an A but not in the fourth position, so skip A for a while)
- CDFB: four gears (fourth position is B. try B again)
- CDFBB: four gears (so we know there are no more Bs in this tune)
- CDFBA: five gears (nailed it!)
This first algorithm looks like this in pseudocode:
notes = A, B, C, D, E, F, G tune = "" while tune.length < 5 try tune + notes[0] if tumblers # move this note to the end of the note list # we're guaranteed to see the correct note before we see this one again note = notes.delete_at[0] notes.push(note) elsif gears > tune.length # correct note tune += notes[0] else notes.delete_at[0] end
Here are its results:
- Average turns per tune: 14.69 (246880 turns to solve all tunes / 16807 tunes)
Algorithm: Naïve 2
The first optimization we can make is that, if at any time we have exactly one possible note left, the rest of the tune most consist solely of that note.
notes = A, B, C, D, E, F, G
tune = ""
while tune.length < 5
if notes.size == 1
tune += notes[0] while tune.length < 5
last
end
try tune + notes[0]
if any tumblers
# move this note to the end of the note list
# we're guaranteed to see the correct note before we see this one again
notes.push(notes.delete_at[0])
elsif gears > tune.length
# correct note
tune += notes[0]
else
notes.delete_at[0]
end
- Average turns per tune: 13.55 (227729 turns / 16807 tunes)
- Average turn savings: 1.14
Algorithm: Naïve 3
The next savings (which will be minor compared to the previous one) is if we've seen five distinct notes, we can rule out any notes not among those five:
notes = A, B, C, D, E, F, G tune = "" seen = nil while tune.length < 5 if seen.size == 5 foreach element in notes delete it if it's not in seen end end if notes.size == 1 tune += notes[0] while tune.length < 5 last end try tune + notes[0] if any tumblers seen.add(notes[0]) unless seen.contains(notes[0]) # move this note to the end of the note list # we're guaranteed to see the correct note before we see this one again notes.push(notes.delete_at[0]) elsif gears > tune.length # correct note tune += notes[0] seen.add(notes[0]) unless seen.contains(notes[0]) else notes.delete_at[0] end
- Average turns per tune: 13.50 (226896 turns / 16807 tunes)
- Average turn savings: 0.05
Algorithm: Naïve 4
So much for the obvious optimizations. Let's try tweaking some details of the algorithm a bit just to see if it helps or hurts.
You know how when we get a tumbler we move the first note to the last position? Let's try doing that when we get a gear, too. Turns out this saves us almost .25 turns.
notes = A, B, C, D, E, F, G
tune = ""
seen = nil
while tune.length < 5
if seen.size == 5
foreach element in notes
delete it if it's not in seen
end
end
if notes.size == 1
tune += notes[0] while tune.length < 5
last
end
try tune + notes[0]
if any tumblers
seen.add(notes[0]) unless seen.contains(notes[0])
# move this note to the end of the note list
# we're guaranteed to see the correct note before we see this one again
notes.push(notes.delete_at[0])
elsif gears > tune.length
# correct note
tune += notes[0]
seen.add(notes[0]) unless seen.contains(notes[0])
notes.push(notes.delete_at[0])
else
notes.delete_at[0]
end
- Average turns per tune: 13.27 (223060 turns / 16807 tunes)
- Average turn savings: 0.23
Algorithm: Knuth 1
Due to arcanehl's constant (yet ever friendly) prodding, I picked this code up again. He started implementing Knuth's Mastermind algorithm, which, while slower, is much more effective. The average turns to solve a tune goes down from 13 to just 5!
It's a radical change from the algorithm used above, so allow me to explain. The basic idea is after a guess, we eliminate any possibilities that would not produce the same score. For example, if we try AAABB and get 2 gears, 0 tumblers, then we can eliminate any possibility that does not produce 2 gears and 0 tumblers against AAABB (such as CCCCC which would produce 0 gears and 0 tumblers). The initial guess is hardcoded to be AAABB though it could be something entirely different. The next guess is (currently) taken arbitrarily; we use the first possibility.
I changed from Perl to C during testing for speed: arcanehl's Ruby implementation of this algorithm takes hours, mine takes less than twenty-five seconds. Of course, these are the results of timing every possible tune (there are 16807 of them). Going through the algorithm for just one tune will be fast enough at a hundredth the speed.
possibilities = ... # each possible tune while possibilities.size > 1 if possibilities.size == 16807 # first guess guess = "AAABB" else guess = possibilities[0] end real_score = try guess possibilities.delete_if {|possibility| score(possibility, guess) != real_score} end
- Average turns per tune: 5.5 (93103 turns / 16807 tunes)
- Average turn savings: 7.7
Algorithm: Knuth 2
About the only possible improvement we can make is to generate a better guess than always using the first possible tune. I figured the median tune (or close enough to it) would be better than the first, and I was right, saving over half a turn.
possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807 # first guess
guess = "AAABB"
else
guess = possibilities[possibilities.size/2]
end
real_score = try guess
possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
- Average turns per tune: 4.92 (82643 turns / 16807 tunes)
- Average turn savings: 0.62
Algorithm: Knuth 3
Mysteriously, we find that the best initial tune is AABBC, not AAABB. (that's called foreshadowing)
This is the algorithm that is used in Rodney3. To save memory, all he remembers for each player is the results of each tune. Naturally there's a fair amount of reprocessing, but a 16807-element array would be kind of large to save for each person (and I'm afraid of Perl bitfields).
possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807 # first guess
guess = "AABBC"
else
guess = possibilities[possibilities.size/2]
end
real_score = try guess
possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
- Average turns per tune: 4.85 (81446 turns / 16807 tunes)
- Average turn savings: 0.07
Algorithm: Knuth 4
Enough goofing around. Time to implement the rest of Knuth's algorithm: at each step, guess the possibility that would eliminate the most possibilities in the worst case. This means that no matter what the response is for the guess, we'll eliminate a maximum number of possibilities. The way I implement it has time complexity O(n^2) where n is the number of tunes left; I don't know if there's a more efficient solution. For each guess i we score each possible tune j. The worst case for i is equal to the largest number of remaining tunes after playing it (given all possible responses, such as three tumblers and one gear). Then we find the smallest worst case (ie, smallest remaining number of tunes) over all i and we play it!
This takes hours even in my somewhat tuned C implementation. Using the median element as above may be more appropriate if you're playing all 16807 tunes, since this extra processing is probably not worth the turn savings. Otherwise, if you're just finding one tune as in an actual game of NetHack, you'll of course want to use the best algorithm available.
Note: we found the best first tune (AABBC) by running this code without the hardcoded first guess. The code obviously produces the same results without the hardcoded first guess, but there's no reason to do all that processing (which is a lot for all 16807 tunes) when we know the first best guess is always AABBC.
possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807 # first guess
guess = "AABBC"
else
best_worst_case = possibilities.size + 100
foreach i in possibilities
remaining = []
foreach j in possibilities
score = score(j, i)
remaining[score] += 1
end
worst_case = remaining.max
if worst_case < best_worst_case
best_worst_case = worst_case
guess = i
end
end
end
real_score = try guess
possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
- Average turns per tune: 4.61 (77422 turns / 16807 tunes)
- Average turn savings: 0.24
Algorithm: Breadth 1
I was talking in #nethack and someone brought up Mastermind. So that revived this whole thing. I was idly searching on the arXiv for Mastermind papers, and only found the one that proves Mastermind is NP-complete, which while good to know, doesn't directly help me improve my solving algorithms. However, I then remembered Google Scholar and searched for Knuth's paper. I couldn't find it, but it did return a few dozen papers that referenced Knuth's; including one by a Mr. Barteld Kooi. The paper, "Yet Another Mastermind Strategy," is available here. The gist of the algorithm is that instead of taking the best worst case scenario (as we do in Knuth's algorithm), take the tune that would produce the largest number of unique responses. So a tune that can return six unique responses is valued over a tune that can only return three unique responses. In the author's words, "[In this strategy, o]ne only looks at the 'breadth' of a partition. On the other side of the spectrum one finds Knuth's worst case strategy, which only looks at the 'depth' of a partition." This next algorithm implements exactly that; looking at only the breadth.
Also, I've rerun the algorithm without the hardcoded first guess, but it guessed AABBC anyway, which is a convenient result.
possibilities = ... # each possible tune while possibilities.size > 1 if possibilities.size == 16807 # first guess guess = "AABBC" else best = 0 foreach i in possibilities responses = [] foreach j in possibilities score = score(j, i) responses[score] += 1 end cnt = 0 foreach response in responses if response > 0 cnt += 1 end end if cnt > best best = cnt guess = i end end real_score = try guess possibilities.delete_if {|possibility| score(possibility, guess) != real_score} end
- Average turns per tune: 4.5 (76038 turns / 16807 tunes)
- Average turn savings: 0.08
Algorithm: Breadth 2
A consistent guess is one that has a chance of being correct; that is, it's consistent with the tunes we've played so far. When figuring out the next tune to play, we discard any inconsistent guesses. Well, let's see what happens when we keep them!
possibilities = all_tunes = ... # each possible tune while possibilities.size > 1 if possibilities.size == 16807 # first guess guess = "AABBC" else best = 0 foreach i in all_tunes responses = [] foreach j in possibilities score = score(j, i) responses[score] += 1 end cnt = 0 foreach response in responses if response > 0 cnt += 1 end end if cnt > best best = cnt guess = i end end end real_score = try guess possibilities.delete_if {|possibility| score(possibility, guess) != real_score} end
- Average turns per tune: 4.4 (73803 turns / 16807 tunes)
- Average turn savings: 0.13
Algorithm: Knuth 5
Let's apply the same change to Knuth's algorithm, just in case it ends up being better than Breadth 2. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 4 algorithm, not Breadth 2.
possibilities = all_tunes = ... # each possible tune while possibilities.size > 1 if possibilities.size == 16807 # first guess guess = "AABBC" else best_worst_case = possibilities.size + 100 foreach i in all_tunes remaining = [] foreach j in possibilities score = score(j, i) remaining[score] += 1 end worst_case = remaining.max if worst_case < best_worst_case best_worst_case = worst_case guess = i end end end real_score = try guess possibilities.delete_if {|possibility| score(possibility, guess) != real_score} end
- Average turns per tune: 4.58 (76911 turns / 16807 tunes)
- Average turn savings: 0.03
Algorithm: Breadth 3
Let's prefer to guess consistently over inconsistently when able. As of Breadth 2 we take the first tune that produces the maximum partition set size, even if that guess is inconsistent. If two tunes produce the same (maximum) partition set size, and one is consistent and the other is not, we should definitely pick the consistent one, because we might stumble upon the answer. One possible area of improvement is listing every tune that produces the maximum partition set size and picking the median one. Also, say an inconsistent tune produces a partition set size of 9 while a consistent guess produces a partition set size of 8. Should we play the consistent one?
possibilities = all_tunes = ... # each possible tune while possibilities.size > 1 if possibilities.size == 16807 # first guess guess = "AABBC" else best = 0 best_is_consistent = false foreach i in all_tunes responses = [] foreach j in possibilities score = score(j, i) responses[score] += 1 end cnt = 0 foreach response in responses if response > 0 cnt += 1 end end if cnt > best or (!best_is_consistent and i_is_consistent and cnt == best) best = cnt guess = i best_is_consistent = i_is_consistent end end end real_score = try guess possibilities.delete_if {|possibility| score(possibility, guess) != real_score} end
- Average turns per tune: 4.38 (73622 turns / 16807 tunes)
- Average turn savings: 0.01
Algorithm: Knuth 6
Let's again apply the same change to Knuth's algorithm, just on the off-chance it ends up being better than Breadth 3. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 5 algorithm, not Breadth 3.
possibilities = all_tunes = ... # each possible tune while possibilities.size > 1 if possibilities.size == 16807 # first guess guess = "AABBC" else best_worst_case = possibilities.size + 100 best_is_consistent = false foreach i in all_tunes remaining = [] foreach j in possibilities score = score(j, i) remaining[score] += 1 end worst_case = remaining.max if worst_case < best_worst_case or (!best_is_consistent and i_is_consistent and worst_case == best_worst_case) best_worst_case = worst_case guess = i best_is_consistent = i_is_consistent end end end real_score = try guess possibilities.delete_if {|possibility| score(possibility, guess) != real_score} end
- Average tunes per turn: 4.51 (75864 turns / 16807 tunes)
- Average turn savings: 0.06
Results
Here are the exact results of each algorithm.
- Naïve 1: Guess notes in turn, remove note if silence, keep track of tune so far
- Naïve 2: If one note left, fill rest of tune with it
- Naïve 3: If five notes seen, eliminate others
- Naïve 4: Rotate notes on gear
- Knuth 1: Eliminate impossible tunes, guess first possibility
- Knuth 2: Guess median possibility, not first
- Knuth 3: Start with AABBC not AAABB
- Knuth 4: Greedily guess best possibility
- Knuth 5: Allow inconsistent guesses
- Knuth 6: Prefer consistent guesses
- Breadth 1: Guess tune that would give the most unique answers
- Breadth 2: Allow inconsistent guesses
- Breadth 3: Prefer consistent guesses
Naïve
turns | Naïve 1 | Naïve 2 | Naïve 3 | Naïve 4 |
---|---|---|---|---|
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | 1 | 1 | 1 | 1 |
6 | 5 | 6 | 6 | 7 |
7 | 15 | 21 | 21 | 28 |
8 | 35 | 77 | 77 | 85 |
9 | 70 | 238 | 242 | 275 |
10 | 126 | 721 | 750 | 825 |
11 | 210 | 1596 | 1632 | 1934 |
12 | 1519 | 2576 | 2617 | 2903 |
13 | 2709 | 3164 | 3196 | 3364 |
14 | 3395 | 3066 | 3076 | 2972 |
15 | 3241 | 2422 | 2405 | 2127 |
16 | 2527 | 1575 | 1539 | 1287 |
17 | 1610 | 840 | 799 | 638 |
18 | 840 | 364 | 331 | 261 |
19 | 364 | 119 | 100 | 85 |
20 | 119 | 21 | 15 | |
21 | 21 | |||
total | 246880 | 227729 | 226896 | 223060 |
average | 14.69 | 13.55 | 13.50 | 13.28 |
Knuth
turns | Knuth 1 | Knuth 2 | Knuth 3 | Knuth 4 | Knuth 5 | Knuth 6 |
---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 19 | 17 | 29 | 29 | 17 | 19 |
3 | 260 | 420 | 463 | 491 | 387 | 423 |
4 | 1337 | 4053 | 4656 | 6149 | 6429 | 7324 |
5 | 6975 | 8935 | 8740 | 9534 | 9839 | 8980 |
6 | 5812 | 3208 | 2752 | 597 | 134 | 60 |
7 | 2052 | 171 | 166 | 6 | ||
8 | 336 | 2 | ||||
9 | 13 | |||||
10 | 2 | |||||
total | 93103 | 82643 | 81446 | 77422 | 76911 | 75864 |
average | 5.54 | 4.92 | 4.85 | 4.61 | 4.58 | 4.51 |
Breadth
turns | Breadth 1 | Breadth 2 | Breadth 3 |
---|---|---|---|
1 | 1 | 1 | 1 |
2 | 33 | 41 | 40 |
3 | 653 | 838 | 846 |
4 | 7215 | 8608 | 8770 |
5 | 8288 | 7140 | 6977 |
6 | 607 | 179 | 173 |
7 | 10 | ||
total | 76038 | 73803 | 73622 |
average | 4.52 | 4.39 | 4.38 |