Machine Learning

Professional Poker Players vs AI

Competing effectively in poker has proven a difficult task for AI. Recently, advancements in reinforcement learning have allowed AI bots to compete effectively in multiplayer settings and teach the best new tricks.

The game of poker has been a challenging problem in the field of Artificial Intelligence (AI) for years. While AI systems have had previous success at beating humans in games that are non-random and follow predefined rules (such as chess and Go), winning at a game of poker has proven to be more challenging because it requires reasoning based on hidden information.

Over the past two decades, we have seen steady progress in the ability of AI systems to play and win various forms of poker. This includes ‘DeepStack’ and ‘Liberatus’, which were built and developed at the University of Alberta in Edmonton, Canada and Carnegie Mellon University in Pittsburgh, USA, respectively. These systems were effective, however were limited to settings involving only two players. Developing AI for multiplayer poker was widely recognized as the major remaining milestone.

Pluribus: An AI Poker-Playing Bot

Pluribus is a robot that uses AI built and developed by Carnegie Mellon University in collaboration with Facebook's AI Lab. Pluribus plays no-limit Texas hold 'em poker and is widely known as ‘the first bot to beat humans in a complex multiplayer competition’, signifying a key milestone in AI.

Description of Pluribus

As with many recent AI-game breakthroughs, Pluribus used the machine learning paradigm, reinforcement learning, to model and master multiplayer poker. Using self-play, Pluribus developed its strategy, in which the AI robot played against copies of itself, without any data of prior human or AI play used as input. In essence, Pluribus started by playing randomly, and gradually improved as it determined which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy.

Pluribus in Action

Pluribus uses a technique called ‘abstraction’ during its decision-making process.

Abstraction is defined as: ‘the process of removing physical, spatial, or temporal details or attributes in the study of objects or systems to focus attention on details of greater importance’.

Abstraction

Abstraction is important to Pluribus’s decision-making process as multiplayer Texas hold 'em poker can have too many decision points to reason about individually. This process ensures that similar actions and decisions are grouped together, while other lesser decisions are eliminated, which reduces the scope of each decision. Pluribus’s current version has two types of abstractions embedded into its decision-making process:

  1. Information abstraction: Here, information that has already been revealed is used to group decision points together. An example would be that Pluribus would see a jack-high flush and a ten-high flush as similar hands and apply the same decision-making logic to either appearance.
  2. Action abstraction: Here, the number of distinct actions considered by Pluribus is reduced. For example, the decision to bet $80 and $81 may not differ strategically, and therefore Pluribus only considers a few distinct bet sizes at each decision point.

Iterative Monte Carlo Algorithm

Pluribus also uses a version of the iterative Monte Carlo algorithm. A simplified explanation of this decision-making technique can be described as follows:

  • At the start of each hand (which coincides with the start of each iteration), the algorithm selects one player as the ‘traverser’ whose poker-playing strategy is updated on this iteration. At the start of the iteration, the algorithm simulates the complete hand of poker based on all the players’ current strategy (which is initialised randomly).
  • Upon the completion of the hand, the algorithm investigates each decision made by the ‘traverser’ and evaluates its relative position had it chosen each of the other available actions instead.
  • Pluribus then evaluates the relative merits of each hypothetical decision following each possible hypothetical action throughout the hand.
  • It is the difference between what the ‘traverser’ could have achieved for each action versus what was actually achieved which is used to update the strategy of the ‘traverser’ for the iteration.
  • A new ‘traverser’ is then randomly selected during the next iteration, and it is the strategy of this ‘traverser’ that is then updated.

Over time, this is how Pluribus is able to understand the strategies employed by each opponent, which it uses to defeat them.

Conclusion

In conclusion, Pluribus represents a key AI breakthrough in recent times. Although intentionally developed and implemented for poker, the general techniques employed (reinforcement learning, abstraction, iterative Monte Carlo etc.) can be applied in many other settings. Notably, professional poker players (including Darren Elias – who holds the most World Poker Tour titles) have used lessons learned from playing against and being defeated by, Pluribus to improve their own strategies.

Enjoyed this read?

Stay up to date with the latest AI news, strategies, and insights sent straight to your inbox!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.