OpenAI & DOTA 2: Game Is Hard

This is part of a series about OpenAI’s attempt to build an AI that can play a full game of DOTA 2 against humans. Earlier in the year I wrote about OpenAI’s initial announcement. Last week, they played their first public match against expert humans.

I started playing DOTA 2 in January 2013 – a really exciting time to start, as it turned out. In the professional scene, a team called Alliance was making headlines, playing in a way that seemed completely unstoppable. In the months leading up to their appearance at DOTA 2’s biggest event, The International, Alliance appeared in seven tournaments and placed first in all of them. Alliance only had one style of play, and were often criticised for this, but the simple fact of the matter was that no-one had an answer for it. It didn’t matter if they were only showing one strategy – it was unbeatable.

This week, watching OpenAI’s bots play against humans, I had that 2013 feeling again.

They’re All Dead!

Let’s get the obvious out of the way first – OpenAI’s bots are exceptionally good at some aspects of DOTA 2. In particular, they’re extremely good at teamfighting, when three or more players fight each other, using spells, basic attacks and items to deal damage and apply negative status effects to their opponents. The reason I say this is ‘the obvious’ is partly because the bots won 2-0, so you probably know they’re good already, but also because this is the part of DOTA 2 we expected the bots to excel at first and foremost.

Their 200ms reaction time might sound humanlike, but in that 200ms window they are actually reading the entire map – not just what you see on-screen – and crunching extremely detailed numbers. If I want to know how much damage a hero will take from my spell, I would have to click the hero, note down their health, then hover over their statistics to find out how much spell damage reduction they have, add in my own spell amplification, and then do some multiplication in my head. In reality, I don’t do this – like many human players, I end up playing by ‘feel’. But OpenAI’s bots don’t need to play by feel, they can make extremely precise calculations and react extremely quickly, and it’s truly unbelievable to watch.

Let’s look at an example or two. Late in Game 1, the human player Fogged has purchased a special item called a blink dagger for his hero, Earthshaker. This is a very common item/hero pairing. A blink dagger lets you ‘blink’, or teleport, a short distance instantly, and Earthshaker has a spell that instantly disables enemies close to him. The blink dagger lets you get near people before they can react, and Earthshaker’s spell also stuns them before they can react. In the image below, Fogged is circled in purple on the right-hand side of the screen. He’s waiting for the enemy team to move into position so he can blink onto them.

If you watch the clip, what you’ll see is that as he blinks in, one of the OpenAI bots instantly casts a spell called Hex, and turns him into a frog. In the image below, Fogged is still circled in purple, and the bot is circled in yellow. Hex wears off after a few seconds and Fogged does eventually cast his spell, but it gives the bots enough time to react and win the fight. Even though this is one of the most talked-about moments in the game, Fogged himself explains that he made several mistakes that gave the bots more time to react. There are ways to queue actions up in DOTA so they are executed one frame after another, but there’s no way to cancel this if you change your mind. Fogged explains he decided not to do it – but if he had, it’s possible the bot wouldn’t have reacted fast enough. Nevertheless, it’s a great bit of DOTA theatre, and you can hear the excitement in the commentator’s voice.

Here’s a second example, and one that I think is even more impressive. The human player Blitz, circled in red in the image below, has taken a lot of damage and tries to run away to the north. Two different heroes use their ultimate – their most powerful spell – to attack him. The first hero is Lich, circled in purple, on the far left of the screen – his ultimate is a slow-moving blue ball. The second is Sniper, offscreen in the direction of the yellow arrow – his ultimate is a bright white/yellow bullet. The combined damage of these two spells is just enough to kill Blitz – if either one had not been used, he would’ve survived.

Why is this impressive? In a human team it would be almost impossible to assess how much damage is required and co-ordinate with another player in time to use both spells to get the kill, in the space of a few seconds. What’s more likely to happen is that one human would decide not to use their spell, and Blitz would escape. Alternatively, another common scenario is humans use too many spells, and waste vital resources they could use to get another kill or win a later fight. But OpenAI never does this – throughout all three matches, OpenAI’s bots consistently use exactly the required resources to achieve a goal. No more, no less. As I mentioned above, in some ways this is exactly the kind of problem we expect an AI to be naturally good at, but it’s still amazing to see it actually happening in such a complex game. I was consistently left speechless by the level of co-ordination and execution on display.

That’s Playing To Win, Baby!

DOTA 2 isn’t just a game about fighting, though. There are lots of things a hero can do besides fighting, many of which have more complex, delayed rewards. For instance, instead of fighting the enemy now, I could send my hero to go and earn gold by killing monsters, allowing them to become stronger for a fight later. But if I do that, the enemy team might take advantage of this, and try and attack my team’s base, the key to winning the game. DOTA has lots of tradeoffs like this, and many of these decisions have impacts that are only really felt five, ten, or thirty minutes later. We already know OpenAI is good at the minute-to-minute play, but can it plan for the future, across an entire game?

The answer you get will depend on who you ask. Commenting a month ago, before he played in the exhibition match, expert player Blitz said that the bots had discovered and understood “one of the highest level plays you can make [in DOTA 2]” by learning to avoid parts of the map that didn’t benefit them. Some of these strategies, he said, he had only been made aware of after eight years of playing the game, and Blitz is a top-level analyst, commentator and ex-professional player. This seems to suggest that the bots are reasoning about their overall team strategy at an extremely high level. However, I think there’s reason to believe this is overstating things a little.

To give an example of why I think this is the case, let’s look at two similar incidents that involve pulling. Pulling is a fairly intricate idea invented by players of DOTA 1. Without going into details, pulling manipulates the way AI-controlled soldiers move around the map, to deny resources to the enemy. In Game 1, the human team perform a pull, and the bots react in a very confused way. A human team would investigate where a pull normally takes place and contest it, but the bots stand around and do nothing. Click here to see a clip.

In Game 2 a similar scenario happens. The humans perform a pull, disrupting the flow of AI soldiers running through the map. This time, however, the OpenAI bots have placed a ward which provides vision in an area and allows them to see the pull happening. They instantly run over and fight the humans, killing both of them and completely disrupting the pull attempt.

The difference between these two incidents is clear – in one game, the bots can see what is happening and go to intervene; in the other, they can’t see anything. DOTA 2 is a game about imperfect information, and gaining vision of the enemy is a big part of this. While the bots exhibit some understanding of this, they play better when they have knowledge of where the players are (which is to be expected, because definite knowledge is better than estimations).

Watching the bots play, it felt like vision was a crucial driving force behind a lot of what they did. Because of how reliably superior they are in teamfights, the bots take any opportunity they have to kill enemy heroes, which helps provide them with resources like gold and delays the enemy team from building up strength. What this means is that when heroes reveal themselves, OpenAI’s bots seemed to want to go and fight them. Speaking after the series, Blitz said that “it felt like I was pressured at all times in the game” and that the bots “knew how to choke a map out”, meaning that they starved the players of resources. I think the bots desire for constant fighting, and the reliability with which these fights were executed, contributed to this feeling of constant pressure.

This desire for constant fighting creates a positive feedback loop which leads to more fighting. Human players tend to gather resources near the safety of their towers. This draws the bots in, since the potential for both killing an enemy hero and attacking an objective exist in the same space. The humans either lose a fight or retreat, which exposes the tower, which the bots then attack and destroy. If at any point the humans decide to dodge an engagement by going to gather resources elsewhere, their presence somewhere else makes that part of the map a more appealing target to attack. This creates the feeling of constant pressure (helped by the fact that the bots are very efficient in their decisions of when and how to travel across the map).

So it seems possible that just a few basic guiding principles, combined with the ability to flawlessly teamfight, might be giving the illusion of a much more complex high-level understanding of the game. Instead of long-term planning, or modelling the distribution of resources across the map, my interpretation is that the bots have learned to always look for fights, and in doing so create a perfect storm of pressuring their opponents; taking objectives; and gaining just enough resources in return to fuel the next part of their attack. Unlike human players who employ a similar strategy, there’s no risk of a miscalculation or an economic slip-up. The bots are always playing at the very edge of what seems possible, but they never slip up and fall over the edge of the cliff.

Of course, whether or not you agree with my assessment, one important thing remains true: it absolutely doesn’t matter whether the bots are planning ahead or not. Because whether the bots are planning thirty seconds ahead or thirty minutes ahead, they’re currently dominating every normal game they play, and until human players find a way to apply pressure on them, or possibly the lifting of restrictions makes the game harder for the bots, OpenAI don’t really need to worry about finding other strategies. Much like Alliance in their run up to The International 2013, until a team can demonstrate they can beat the playstyle they’ve developed, they don’t need to find a second one.

I… Uh, That Was… Questionable, At The Best

One thing I want to praise OpenAI for in particular is their approach to Game 3 of the series. The AI won the first two games, meaning it had already won the series under the rules of a Best of 3. To use up the extra time, OpenAI let the crowd pick heroes for the bots, and ran another game. In the first two games, OpenAI estimated its chance of winning at above 95%. After the crowd picked the worst heroes they could think of, OpenAI’s estimate for Game 3 was below 3%. What followed was a victory for the humans, and showed up the limitations of OpenAI’s approach, which was a very bold move and a great showing of vulnerability, something which most AI companies refuse to do.

Even though the draft was bad, OpenAI’s estimation of a 3% chance to win surprised me. Superior mechanical skill can often combat a bad draft, and OpenAI had already proved they were very capable in that department. But as the game began, it became a little clearer why OpenAI had rated its chances so low – it seemed to be playing a very similar style to the first two games, even though its heroes were now very inappropriate for it. If anything, its aggression was ratcheted up even further: it sent four of its five heroes to the same part of the map at the start of the game, something absolutely unheard of in regular DOTA matches.

OpenAI’s Game 3 heroes were mostly what we would call core or utility heroes. These are heroes which need one or two items to be effective, or who become more and more powerful the longer the game goes. Strategies which utilise these heroes normally rely on delaying the game and safely acquiring resources until they can become powerful. A hero like Slark, which OpenAI were given to play with, will gather gold for the first ten minutes, perhaps, and then start fighting when he has his first big item purchased.

But OpenAI didn’t do this – instead Slark was sent in to fights, along with their other heroes. At first it seemed to be working, securing them a few early kills. But grouping heroes together like this leads to extreme diminishing returns. Experience points, awarded to heroes when they kill enemy players, are equally divided among all nearby allied heroes. That means each hero only received a quarter of the experience for each kill, and had to compete for the small amount of gold available too. This is why, traditionally, heroes spread out across the map initially, so core heroes can gather the resources they need to become powerful later.

As the game went on, and the bots began to fall behind, their behaviour became increasingly erratic. Some of the things the bots did seemed to have flashes of intelligence – at one point, Slark runs across the length of the map, distracting the NPC soldiers and delaying the progress of the human team. But although this might have some upsides, it’s not close to what OpenAI needed to be doing (in my opinion), which was building up their big powerhouse heroes like Slark and using other heroes to buy time. Interestingly, even basic hero control seemed to suffer as the bots were pushed further out of their comfort zone: later in the game, Slark is seen running into an allied hero repeatedly in an attempt to move past her, something we didn’t see in Games 1 or 2.

It’s possible that my read on this situation is wrong. One argument might be that the bots are so sure that they cannot win a long game that instead they decide it is better to gamble on overwhelming the humans early. To me, however, Game 3 suggested that OpenAI still has issues planning in the long term, and as we explored in the previous section, mainly excels at aggressive, fast-paced, minute-to-minute DOTA. In the Q&A afterwards the engineers note that increasing the bots’ ability to look into the future is one of their priorities, and it’ll be exciting to see how their strategies change based on that.

I Can’t Believe What I’m Seeing!

In August 2013 I watched my first International, the biggest event in the DOTA 2 calendar and the de facto end of the DOTA 2 competitive season. The Alliance had had a near-flawless run through the main event, reaching the final having only lost a single game. In every game it was the same story – teams would struggle to ban Alliance’s best heroes during the hero selection phase, but they never had enough bans, and Alliance would secure something powerful. The game would begin, Alliance would dominate, and they would win.

In the grand finals, their opponents did something strange – instead of banning Alliance’s best heroes, they instead used all of their bans on Alliance’s most innocuous, least glamourous player, Akke. They removed the heroes he loved to play, and suddenly Alliance became unstable. Despite getting their best heroes, they lost Games 2 and 3 of the five-game series. Through a remarkable adaptation in Game 4, and a legendary series of events in Game 5, Alliance did eventually claw their way back and secure their World Champion title. It’s a feat that is still talked about today as one of the greatest series in DOTA 2 history.

In a week or two, OpenAI will walk onto a similar stage at this year’s International, and play a team of world-class professionals. Like the Alliance, they’re coming off the back of a run of spectacular successes, and like the Alliance, they seem to only have one very powerful strategy. I don’t know whether a human team will be able to upset the bots enough to give us as dramatic a series as Alliance had in 2013, but I’m fairly sure that regardless of the outcome, it’s a series that will be remembered as incredibly significant in its own right.

A lot of people still joke that the Alliance didn’t really play DOTA 2 – they won because the current state of the game favoured them, because of the way certain mechanics worked at the time, or because they played a particularly reviled style of DOTA that teams found hard to deal with. The same could be said of OpenAI’s bots – we’ve seen clear evidence of their weaknesses, we’ve seen them make mistakes, we’ve seen them act strangely when they’re pushed out of their comfort zone. But ultimately, a win is a win, and while OpenAI might not have a perfect understanding of every aspect of DOTA 2, whatever understanding it does have has created an extremely formidable team.

I’ll be looking forward to the match, and eager to see whether more restrictions can be lifted by then. But regardless, it’s a remarkable achievement to have gotten this far in just twelve months. I have a few more posts planned about OpenAI’s project which I hope to write between now and then, but in the meantime, I highly recommend you check out bits and pieces of the exhibition matches, which can all be found at this link. Congratulations to both Team Human and OpenAI for putting on a great show.

Thanks to Azalea, Andrew, Chris, Fed, John and Charlie for reading drafts of this post.