# Solving NetHack's Mastermind

Note: This article was originally written in 2006. It is published here for posterity.

NetHack's Mastermind is very similar to the real Mastermind, with three changes:

• Instead of five colors, there are seven notes (A, B, C, D, E, F, and G).
• There are five positions, not four.
• You can give partial solutions (such as AB even though the solution is five notes long).

A gear represents a note in the correct position. A tumbler represents a note in an incorrect position. So, for example, if you're trying to find the tune FFAAB and you guess AFAFB, you'll get three gears and two tumblers. If you guess CA you'll get one tumbler. If you guess BBBBB you'll get one gear. If you guess DDDDD you'll get silence (in other words, no gears and no tumblers).

This document describes my attempts at optimizing a NetHack Mastermind solver (with help from Sean Kelly and papers from Don Knuth and Barteld Kooi). The first, naïve algorithm finds the solution on average within 15 turns, possibly taking up to 21. The last, most effective algorithm finds the solution on average within 4 turns, possibly taking up to only 6. This is an important result because in NetHack, the level where you play Mastermind is one of the most dangerous, so you have a strong incentive (your own survival) to find the solution as quickly as possible.

## Algorithm: Naïve 1

The first tune we always test is A. If that's a gear, then we're done with the first position. We don't know if any more As are in the tune, so our next stab is AA. Back to the first play though: if A produces a tumbler, then we skip it for now and move on to B. If we get silence, then we know A is not in the tune at all, so we remove it from any further consideration. Repeat this until we have found the tune.

This basic algorithm is straightforward and works well enough. This is roughly what I do when I am playing NetHack, since it can be done without assistance.

So let's use this algorithm to try to find a particular tune:

1. A: one tumbler (we know there's an A but not in the first position, so skip A for a while)
2. B: one tumbler (ditto. skip B for a while)
3. C: one gear (great! first position is C. Try C again)
4. CC: one gear (now we know there are no more Cs in this tune)
5. CD: two gears (second position is D. try D again)
6. CDD: two gears (now we know there are no more Ds in this tune)
7. CDE: two gears (now we know there are no Es in this tune)
8. CDF: three gears (third position is F. try F again)
9. CDFF: three gears (so we know there are no more Fs in this tune)
10. CDFG: three gears (so we know there are no Gs in this tune)
11. CDFA: three gears, one tumbler (we know there's an A but not in the fourth position, so skip A for a while)
12. CDFB: four gears (fourth position is B. try B again)
13. CDFBB: four gears (so we know there are no more Bs in this tune)
14. CDFBA: five gears (nailed it!)

This first algorithm looks like this in pseudocode:

```  notes = A, B, C, D, E, F, G
tune  = ""

while tune.length < 5
try tune + notes

if tumblers
# move this note to the end of the note list
# we're guaranteed to see the correct note before we see this one again
note = notes.delete_at
notes.push(note)
elsif gears > tune.length
# correct note
tune += notes
else
notes.delete_at
end
```

Here are its results:

• Average turns per tune: 14.69 (246880 turns to solve all tunes / 16807 tunes)

## Algorithm: Naïve 2

The first optimization we can make is that, if at any time we have exactly one possible note left, the rest of the tune most consist solely of that note.

```  notes = A, B, C, D, E, F, G
tune  = ""

while tune.length < 5
if notes.size == 1
tune += notes while tune.length < 5
last
end

try tune + notes

if any tumblers
# move this note to the end of the note list
# we're guaranteed to see the correct note before we see this one again
notes.push(notes.delete_at)
elsif gears > tune.length
# correct note
tune += notes
else
notes.delete_at
end
```
• Average turns per tune: 13.55 (227729 turns / 16807 tunes)
• Average turn savings: 1.14

## Algorithm: Naïve 3

The next savings (which will be minor compared to the previous one) is if we've seen five distinct notes, we can rule out any notes not among those five:

```  notes = A, B, C, D, E, F, G
tune  = ""
seen  = nil

while tune.length < 5
if seen.size == 5
foreach element in notes
delete it if it's not in seen
end
end

if notes.size == 1
tune += notes while tune.length < 5
last
end

try tune + notes

if any tumblers
# move this note to the end of the note list
# we're guaranteed to see the correct note before we see this one again
notes.push(notes.delete_at)
elsif gears > tune.length
# correct note
tune += notes
else
notes.delete_at
end
```
• Average turns per tune: 13.50 (226896 turns / 16807 tunes)
• Average turn savings: 0.05

## Algorithm: Naïve 4

So much for the obvious optimizations. Let's try tweaking some details of the algorithm a bit just to see if it helps or hurts.

You know how when we get a tumbler we move the first note to the last position? Let's try doing that when we get a gear, too. Turns out this saves us almost .25 turns.

```  notes = A, B, C, D, E, F, G
tune  = ""
seen  = nil

while tune.length < 5
if seen.size == 5
foreach element in notes
delete it if it's not in seen
end
end

if notes.size == 1
tune += notes while tune.length < 5
last
end

try tune + notes

if any tumblers
# move this note to the end of the note list
# we're guaranteed to see the correct note before we see this one again
notes.push(notes.delete_at)
elsif gears > tune.length
# correct note
tune += notes
notes.push(notes.delete_at)
else
notes.delete_at
end
```
• Average turns per tune: 13.27 (223060 turns / 16807 tunes)
• Average turn savings: 0.23

## Algorithm: Knuth 1

Due to arcanehl's constant (yet ever friendly) prodding, I picked this code up again. He started implementing Knuth's Mastermind algorithm, which, while slower, is much more effective. The average turns to solve a tune goes down from 13 to just 5!

It's a radical change from the algorithm used above, so allow me to explain. The basic idea is after a guess, we eliminate any possibilities that would not produce the same score. For example, if we try AAABB and get 2 gears, 0 tumblers, then we can eliminate any possibility that does not produce 2 gears and 0 tumblers against AAABB (such as CCCCC which would produce 0 gears and 0 tumblers). The initial guess is hardcoded to be AAABB though it could be something entirely different. The next guess is (currently) taken arbitrarily; we use the first possibility.

I changed from Perl to C during testing for speed: arcanehl's Ruby implementation of this algorithm takes hours, mine takes less than twenty-five seconds. Of course, these are the results of timing every possible tune (there are 16807 of them). Going through the algorithm for just one tune will be fast enough at a hundredth the speed.

```    possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AAABB"
else
guess = possibilities
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 5.5 (93103 turns / 16807 tunes)
• Average turn savings: 7.7

## Algorithm: Knuth 2

About the only possible improvement we can make is to generate a better guess than always using the first possible tune. I figured the median tune (or close enough to it) would be better than the first, and I was right, saving over half a turn.

```    possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AAABB"
else
guess = possibilities[possibilities.size/2]
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 4.92 (82643 turns / 16807 tunes)
• Average turn savings: 0.62

## Algorithm: Knuth 3

Mysteriously, we find that the best initial tune is AABBC, not AAABB. (that's called foreshadowing)

This is the algorithm that is used in Rodney3. To save memory, all he remembers for each player is the results of each tune. Naturally there's a fair amount of reprocessing, but a 16807-element array would be kind of large to save for each person (and I'm afraid of Perl bitfields).

```    possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AABBC"
else
guess = possibilities[possibilities.size/2]
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 4.85 (81446 turns / 16807 tunes)
• Average turn savings: 0.07

## Algorithm: Knuth 4

Enough goofing around. Time to implement the rest of Knuth's algorithm: at each step, guess the possibility that would eliminate the most possibilities in the worst case. This means that no matter what the response is for the guess, we'll eliminate a maximum number of possibilities. The way I implement it has time complexity O(n^2) where n is the number of tunes left; I don't know if there's a more efficient solution. For each guess i we score each possible tune j. The worst case for i is equal to the largest number of remaining tunes after playing it (given all possible responses, such as three tumblers and one gear). Then we find the smallest worst case (ie, smallest remaining number of tunes) over all i and we play it!

This takes hours even in my somewhat tuned C implementation. Using the median element as above may be more appropriate if you're playing all 16807 tunes, since this extra processing is probably not worth the turn savings. Otherwise, if you're just finding one tune as in an actual game of NetHack, you'll of course want to use the best algorithm available.

Note: we found the best first tune (AABBC) by running this code without the hardcoded first guess. The code obviously produces the same results without the hardcoded first guess, but there's no reason to do all that processing (which is a lot for all 16807 tunes) when we know the first best guess is always AABBC.

```    possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AABBC"
else
best_worst_case = possibilities.size + 100
foreach i in possibilities
remaining = []

foreach j in possibilities
score = score(j, i)
remaining[score] += 1
end

worst_case = remaining.max
if worst_case < best_worst_case
best_worst_case = worst_case
guess = i
end
end
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 4.61 (77422 turns / 16807 tunes)
• Average turn savings: 0.24

## Algorithm: Breadth 1

I was talking in #nethack and someone brought up Mastermind. So that revived this whole thing. I was idly searching on the arXiv for Mastermind papers, and only found the one that proves Mastermind is NP-complete, which while good to know, doesn't directly help me improve my solving algorithms. However, I then remembered Google Scholar and searched for Knuth's paper. I couldn't find it, but it did return a few dozen papers that referenced Knuth's; including one by a Mr. Barteld Kooi. The paper, "Yet Another Mastermind Strategy," is available here. The gist of the algorithm is that instead of taking the best worst case scenario (as we do in Knuth's algorithm), take the tune that would produce the largest number of unique responses. So a tune that can return six unique responses is valued over a tune that can only return three unique responses. In the author's words, "[In this strategy, o]ne only looks at the 'breadth' of a partition. On the other side of the spectrum one finds Knuth's worst case strategy, which only looks at the 'depth' of a partition." This next algorithm implements exactly that; looking at only the breadth.

Also, I've rerun the algorithm without the hardcoded first guess, but it guessed AABBC anyway, which is a convenient result.

```    possibilities = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AABBC"
else
best = 0
foreach i in possibilities
responses = []

foreach j in possibilities
score = score(j, i)
responses[score] += 1
end

cnt = 0
foreach response in responses
if response > 0
cnt += 1
end
end

if cnt > best
best = cnt
guess = i
end
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 4.5 (76038 turns / 16807 tunes)
• Average turn savings: 0.08

## Algorithm: Breadth 2

A consistent guess is one that has a chance of being correct; that is, it's consistent with the tunes we've played so far. When figuring out the next tune to play, we discard any inconsistent guesses. Well, let's see what happens when we keep them!

```    possibilities = all_tunes = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AABBC"
else
best = 0
foreach i in all_tunes
responses = []

foreach j in possibilities
score = score(j, i)
responses[score] += 1
end

cnt = 0
foreach response in responses
if response > 0
cnt += 1
end
end

if cnt > best
best = cnt
guess = i
end
end
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 4.4 (73803 turns / 16807 tunes)
• Average turn savings: 0.13

## Algorithm: Knuth 5

Let's apply the same change to Knuth's algorithm, just in case it ends up being better than Breadth 2. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 4 algorithm, not Breadth 2.

```    possibilities = all_tunes = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AABBC"
else
best_worst_case = possibilities.size + 100
foreach i in all_tunes
remaining = []

foreach j in possibilities
score = score(j, i)
remaining[score] += 1
end

worst_case = remaining.max
if worst_case < best_worst_case
best_worst_case = worst_case
guess = i
end
end
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 4.58 (76911 turns / 16807 tunes)
• Average turn savings: 0.03

## Algorithm: Breadth 3

Let's prefer to guess consistently over inconsistently when able. As of Breadth 2 we take the first tune that produces the maximum partition set size, even if that guess is inconsistent. If two tunes produce the same (maximum) partition set size, and one is consistent and the other is not, we should definitely pick the consistent one, because we might stumble upon the answer. One possible area of improvement is listing every tune that produces the maximum partition set size and picking the median one. Also, say an inconsistent tune produces a partition set size of 9 while a consistent guess produces a partition set size of 8. Should we play the consistent one?

```    possibilities = all_tunes = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AABBC"
else
best = 0
best_is_consistent = false

foreach i in all_tunes
responses = []

foreach j in possibilities
score = score(j, i)
responses[score] += 1
end

cnt = 0
foreach response in responses
if response > 0
cnt += 1
end
end

if cnt > best or (!best_is_consistent and i_is_consistent and cnt == best)
best = cnt
guess = i
best_is_consistent = i_is_consistent
end
end
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average turns per tune: 4.38 (73622 turns / 16807 tunes)
• Average turn savings: 0.01

## Algorithm: Knuth 6

Let's again apply the same change to Knuth's algorithm, just on the off-chance it ends up being better than Breadth 3. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 5 algorithm, not Breadth 3.

```    possibilities = all_tunes = ... # each possible tune
while possibilities.size > 1
if possibilities.size == 16807  # first guess
guess = "AABBC"
else
best_worst_case = possibilities.size + 100
best_is_consistent = false

foreach i in all_tunes
remaining = []

foreach j in possibilities
score = score(j, i)
remaining[score] += 1
end

worst_case = remaining.max
if worst_case < best_worst_case or (!best_is_consistent and i_is_consistent and worst_case == best_worst_case)
best_worst_case = worst_case
guess = i
best_is_consistent = i_is_consistent
end
end
end

real_score = try guess

possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
end
```
• Average tunes per turn: 4.51 (75864 turns / 16807 tunes)
• Average turn savings: 0.06

## Results

Here are the exact results of each algorithm.

1. Naïve 1: Guess notes in turn, remove note if silence, keep track of tune so far
2. Naïve 2: If one note left, fill rest of tune with it
3. Naïve 3: If five notes seen, eliminate others
4. Naïve 4: Rotate notes on gear
5. Knuth 1: Eliminate impossible tunes, guess first possibility
6. Knuth 2: Guess median possibility, not first
7. Knuth 3: Start with AABBC not AAABB
8. Knuth 4: Greedily guess best possibility
9. Knuth 5: Allow inconsistent guesses
10. Knuth 6: Prefer consistent guesses
11. Breadth 1: Guess tune that would give the most unique answers
12. Breadth 2: Allow inconsistent guesses
13. Breadth 3: Prefer consistent guesses

### Naïve

turns Naïve 1 Naïve 2 Naïve 3 Naïve 4
1
2
3
4
5 1 1 1 1
6 5 6 6 7
7 15 21 21 28
8 35 77 77 85
9 70 238 242 275
10 126 721 750 825
11 210 1596 1632 1934
12 1519 2576 2617 2903
13 2709 3164 3196 3364
14 3395 3066 3076 2972
15 3241 2422 2405 2127
16 2527 1575 1539 1287
17 1610 840 799 638
18 840 364 331 261
19 364 119 100 85
20 119 21 15
21 21
total 246880 227729 226896 223060
average 14.69 13.55 13.50 13.28

### Knuth

turns Knuth 1 Knuth 2 Knuth 3 Knuth 4 Knuth 5 Knuth 6
1 1 1 1 1 1 1
2 19 17 29 29 17 19
3 260 420 463 491 387 423
4 1337 4053 4656 6149 6429 7324
5 6975 8935 8740 9534 9839 8980
6 5812 3208 2752 597 134 60
7 2052 171 166 6
8 336 2
9 13
10 2
total 93103 82643 81446 77422 76911 75864
average 5.54 4.92 4.85 4.61 4.58 4.51