How to play Rock, Paper, Scissors optimally: the Nash equilibrium

Rock, Paper, Scissors (RPS) is a very simple game. So simple, in fact, it’s almost dull! However, it may be more interesting than you realise.

We all think we know how to play the game “correctly”, based on intuition alone. You simply choose your strategy randomly and hope to get lucky.

But how do we implement such a strategy in practice? If we had a calculator to hand, we could use the random number generator (RNG) function to generate a decimal between 0 and 1 and use the following rule to make our decisions:

0.00 – 0.33: play R
0.33 – 0.66: play P
0.66 – 0.99: play S

We’ll call this the RNG strategy. I suggest you try this the next time you and your wife are deciding whose turn it is to make a cup of tea; just for the reaction you’ll get if nothing else. An important detail about the pseudo-random RNG strategy is that it cannot lose (in the long term). By that I mean over time, as you play more and more games, your ratio of win:draw:lose will stabilise at 1/3:1/3:1/3 and your victories and losses will cancel each other out.

This is a common focus of Game Theory analysis. We accept that we cannot control short-term luck and instead turn our attention to long-term results using the law of large numbers, considering what would happen if we used this strategy over millions of games.

There’s an interesting paradox hidden within the claim we just made, though. Notice that we made no mention of our opponent’s strategy. Our ratio will converge to 1/3:1/3:1/3, regardless of what our opponent does.

To get an intuition for why this is true, imagine for a moment you’re playing against the most predictable opponent in the world; they always choose Rock. If we persist with the RNG strategy, then the following will happen:

1/3 of the time we play Rock vs Rock (draw)
1/3 of the time we play Paper vs Rock (win)
1/3 of the time we play Scissors vs Rock (lose)

In the long-term, our ratio is still 1/3:1/3:1/3, as we predicted. The amazing thing here is that it does not matter what ratios our opponent decides to use; our long-term results are exactly the same. For those of you familiar with probability trees, you can test this for yourself when your opponent uses the following ratios verses your RNG strategy:

Rock: 4/9
Paper: 2/9
Scissors: 3/9

Why perfection loses money

Some of the previous examples should be raising alarm bells. When one player uses the RNG strategy, it literally does not matter what the opponent does. They could be using the same RNG strategy, they could be playing Rock every time, or they could be doing anything in between. Our long-term expectation is the same: we’ll win about as much as we lose.

To frame this phenomenon a bit more mathematically, imagine you win £1 every time you win and lose £1 every time you lose (nothing happens when you draw). Then in the long-term you expect to win exactly £0. However, there is some nuance to consider, as the RNG strategy actually loses money.

Imagine the following scenario. You play a million games of RPS over your life time for £1 a game (you’re the world’s most boring gambling addict). At the end of your thrilling career, you have won exactly £0. But how much time did you spend playing? And what was the opportunity cost of that time? Could you have been doing something more productive with your time, something that actually made money?

A close-up of a pile of British pence coins mainly of values 1p, 10p and 20p. — A selection sub-£1 denominations of British coins. Image by Kelvin Stuttard, source: Pixabay.

This concept of opportunity cost is important, and it comes up all the time.

I was once asked to do an hour of online maths tutoring on a Sunday morning for £40. I declined because the patio needed de-weeding, and I wanted to save money – I wasn’t going to pay a handyman £20 for something I could do myself! The job took about an hour and my back was aching by the end of it. Did I save money? Well, yes, I spent £0 instead of £20. But I could have paid the £20 and used that hour to do the maths tutoring, leaving me with a net profit of £20 and the ability to stand up straight.

On the face of it, the conclusion is simple; don’t waste your time playing RPS. Or is there more to it?

Why you should be making mistakes

We’ve seen that when one of the players (or both) uses the RNG strategy, then RPS is a break-even game (losing even when you consider the opportunity cost). The reality, though, is that nobody is using the RNG strategy. Even if you know about the strategy, actually implementing it is very difficult without the use of a calculator. Just try playing RPS a hundred times in a row while trying to be as random as possible. It’s mentally exhausting and, no matter how hard you try, you’ll eventually fall into a predictable pattern. In fact, mass data analysis (MDA) suggests that, on average, people play Rock at too high a frequency (according to WRPSA, the World Rock Paper Scissors Association).

Something like the following is typical for someone trying to replicate a random strategy:

Rock: 37%
Paper: 30%
Scissors: 33%

We’ll call this the MDA strategy.

The necessary response is simple. Our opponent is biased towards choosing Rock, so we should be biased towards choosing Paper. And if you’re playing a one-off game against a new opponent, this is what I recommend you do; just play Paper. In fact, we can calculate our expected payoff when we use this maximal exploitative strategy versus the MDA strategy. We can expect to win, on average, £0.04 per game. It’s not much, but it’s certainly better than the £0 we expected when our opponent played perfectly!

Is this exploitative strategy a mistake? Well, the answer to that is nuanced. On the one hand, it maximises our expected payoff versus our opponent’s mistake. On the other hand, it is a deviation from RNG, and can be counter-exploited by our opponents (if they know we’re doing it). In fact, we’re about to see exactly how this “mistake” can be punished.

How mistakes can lose even more money

As we said, Game Theory is concerned about the long-term. What would happen if you played like this against the same person for 100 consecutive games? It won’t take them long to realise we’re playing Paper every single time. And just like that, our advantage is lost. Even the most unobservant opponents are capable of making adjustments against such obvious deviations and, when they do it, they may not be so obvious about it. If they realise what we’re doing and adjust their frequencies even slightly to something like:

Rock: 30%
Paper: 30%
Scissors: 40%

Then our expected payoff per game plummets to -£0.10 per game; we’re now losing money!

And this simple example sums up the problem with exploitation. By deviating from the safe confines of the RNG strategy, we have the potential to boost our win rate from zero to hero. But, take it too far and our opponents will adjust and we’re suddenly losing money.

Why sub-optimal mistakes will make you rich

Our mistake was to take things too far. I believe there’s a sweet spot somewhere between the RNG strategy (no risk, no reward) and the maximal exploitative strategy (high risk, high reward). I would call this sweet spot a marginal exploitative strategy (low risk, low reward). Something like the following might be one such choice when playing against the MDA strategy:

Rock: 30%
Paper: 45%
Scissors: 25%

We have a more conservative bias towards Paper but will now generate a measly expected payoff of £0.0095 per game, or about one penny! To put that in context, that’s 100 games to win approximately £1!

Is this suggestion the perfect balance between risk and reward? Who knows! That’s the beautiful grey area in exploitative Game Theory. How far can we push our luck before we run a serious risk of our opponent realising and adapting. I would suggest that for a game like RPS, we can’t push our luck very far.

Is all this effort worth it for a penny? That depends on a few things such as:

Opportunity cost – could we make more money elsewhere with less time and effort?
Size of the game – what if we played for £10 per game, or £100 per game? Our expected payoff becomes £0.095 per game and £0.95 per game, respectively.
Strength of opponent – the MDA strategy isn’t perfect, but it’s not far off. What if the exact person in front of you is even further away from RNG? Our marginal exploitative strategy will now generate a higher expected payoff.
Our skill edge – it’s not enough for our opponent to make mistakes. We need to have the clarity of thought to be able to identify: what mistakes they’re making; how to exploit them; how much to exploit them.

RPS vs NLHE

An important feature of RPS is its simplicity. By that I mean it’s very easy for our opponents to defend themselves. There’s only one decision to make when they play, and it’s the same decision every time. As such, there’s only one strategy to learn (RNG), and it’s relatively simple to implement something approximating it. Sure, they’ll make small mistakes, and we can exploit them a little bit, but our profit margins are so thin that it’s probably not in our interest to play when we consider opportunity costs.

A white person's hand with a face-down blue and white deck of cards on a patterned tablecloth. In the photo, the person is sitting opposite the viewer, with their index finger and thumb resting on the top card. — Someone considers their next move with the deck of cards. Image by tookapic on Pixabay.

Compare this now to one of my favourite games; No Limit Hold’Em (NLHE) poker. There is no longer a single decision point (node) for our opponents to think about. There are in fact thousands of them spread across the entire game tree. Memorising (and implementing) the appropriate un-exploitable strategy at each node is simply not humanly possible. Not only that, but implementing something even approximating the optimal strategy is incredibly difficult for all but the strongest professionals.

Recreational poker players are, quite simply, making huge mistakes on a consistent basis. But how do we exploit them across thousands of different nodes, and how far can we push our luck (maximal vs marginal exploitation)? The over-simplified answer to these two questions is:

Mass Data Analysis reveals consistent patterns in the types of mistakes poker players make, generating useful heuristics to guide our exploitative strategies.
Each of the thousands of nodes occurs infrequently enough, that we can use maximally exploitative strategies across the board with very little risk of being detected (in live poker at least, where players don’t have access to a HUD).

The end result is that professional poker can be very lucrative for a dedicated professional who is willing to invest the time and hard work required to learn these exploitative heuristics. As with all games, there is short-term variance, high variance in fact for live poker. But, long term, it’s possible to make good money. At 5/5 live 200 cap in the USA, the strongest players can expect to make close to $100 per hour. Once their bankroll is big enough to handle the variance, they can move up in stakes and push that win rate higher and higher.

Conclusion

If you’re playing a one-off game of RPS, play Paper! You’re not guaranteed to win, but you give yourself the best chance. However, you probably shouldn’t take up professional RPS unless you can consistently play against very weak players for large amounts of money.

The more complicated the game, the more we can push our luck and play close to maximal exploitative strategies – it’s simply too difficult for our opponents to realise what we’re doing.

How to play Rock, Paper, Scissors optimally: the Nash equilibrium

Author

More from this author

Why perfection loses money

Why you should be making mistakes

How mistakes can lose even more money

Why sub-optimal mistakes will make you rich

RPS vs NLHE

Conclusion

Latest articles

More like this

About

Categories

Shop