I’m in Salamanca, Spain this week to attend the International Conference on Computational Creativity, and even though I haven’t slept in 30 hours, Open AI dropped a big piece of news today about their DOTA 2 research and I wanted to provide a few thoughts in case you’re interested in the project and want a different angle on it. These aren’t particularly polished thoughts, apologies in advance, but you’ll have no end of thinkpieces and articles about it before the month is out, don’t worry.
OpenAI, an AI foundation funded by Elon Musk, has built a multi-agent AI system to play a very simple version of DOTA 2, a popular competitive online game. Last year you might remember they did something similar, on an even more simplified subset of DOTA 2 called 1v1 Mid. This new version takes several steps towards playing a full game of DOTA 2, and even though it’s still a long way off, it’s made some important steps forward.
Next month Open AI will stream a live game of the bots playing a team of “top” human players, and in August they’ll appear live on stage at the International and play an all-star lineup of human players, with most of these restrictions still in place.
1. Open AI really have made some bots teach themselves the basics of DOTA 2, albeit with a bit of help. There’s some important human-coded caveats which we’ll get to later, but the bots start by wandering the map and eventually learn to last-hit, push lanes, and even creep block (something that was human-coded in the 1v1 system last year). That’s pretty cool, if you ask me!
2. They’ve made some cool advances into understanding long-term rewards, which is a big issue in a game like DOTA 2 where sacrifices may be made in the short-term to make gains in the long-term. I’m sure this still has a ways to go, but the official blog post has some interesting data about future reward weighting. This is important because Open AI are using DOTA 2 as a means to an end, and this is a good general thing that can be taken forward.
3. They can beat humans at a (special) videogame, again, and even though I’m about to tear into this concept for a couple of reasons, it’s still an achievement to start with nothing and use reinforcement learning to have an average grasp of DOTA concepts like teamfighting and tower-pushing. The bots can blink around, cast their spells, use items, stuff like that. Despite all my misgivings about this line of AI-as-spectacle, I’m pretty excited for their live showmatches coming up in July and August.
4. Reinforcement Learning is pretty good for a game like DOTA 2. For most game AI you might expect to design an experience carefully for players to enjoy, which this reinforcement learning approach doesn’t really help developers to do, so it’s not exactly going to become the way all game AI is programmed any time soon. But DOTA 2 is a fundamentally human game – no-one really wants to use these bots to replace humans, I don’t think. What they might do, however, is give us insight into new ways to play the game, hidden depths or unseen imbalances, something I feel isn’t discussed enough when talking about this work. A lot of what is now standard DOTA 2 originally came from bugs, weird systems emergence, or community trends. AI with no preconceptions about how the game works will potentially find even more stuff like this.
1. The restrictions in place are even more significant than before, and it’s really important to understand that we are still dealing with a small fraction of DOTA 2’s complexity here. This doesn’t mean the work is not impressive – it’s just important to get things in perspective. The bots have human-authored item and skill builds that never change, are forced to lanes and not allowed to leave them for a while, many items and game features are disabled (including crucial concepts like vision) and most importantly: every match is a mirror matchup of the same five heroes. Not only do mirror matchups not exist in DOTA 2, the heroes chosen are about as simple as it gets, and don’t have a lot of interesting synergy on offer (although there’s a little here or there).
2. Some of the restrictions may inadvertently be making other tasks easier. For example, the bots aren’t allowed to buy or use wards, which are static items that provide vision. One of the functions of wards is that they help give you information when a hero is coming to kill you from another part of the map. Without wards, these kill-movements are a lot easier – I don’t know if these bots will get better or worse at this if vision is added back in. Similarly, in their flashy trailer video they show Crystal Maiden blinking in with a BKB (an item which makes her immune to magic) to cast her ultimate. If my understanding is correct and the item builds are limited, the players might be pretty helpless against this. Buying items appropriate to the situation is a huge part of DOTA 2 – without it, some heroes have a very easy life.
3. Most of the very visible tasks are things that we would expect computers to completely outclass humans at – precision, timing-based actions. Learning to choose these actions is of course an interesting challenge, but the way we perceive games like DOTA 2 means that we’re impressed at how fast the computer reacts with Skill X, instead of being impressed that it learned that Skill X was the correct response (because to a human, learning to use Skill X is easy, and the reaction speed is the thing that you have to practice). I have a little more to say about this below.
4. Open AI seem to have sidestepped some problems, and it’s not clear why. The task of hero selection makes a huge difference to the game – their bots are playing 180 years of DOTA 2 every day, over and over and over again with the same five heroes. In regular DOTA 2, the number of possible hero combinations is 115-choose-10 or around 74,500,000,000,000. Transferring knowledge from different scenarios across neural networks is still pretty hard – learning to last hit with hero X against hero Y with hero Z supporting is different in subtle ways if we swap them out for heroes A, B and C.
Maybe they’re just waiting to do this later? It’s possible, but last year Open AI’s 1v1 Mid bot only played Shadow Fiend and, to the best of my knowledge, it still does. If they wanted to solve the hero selection problem, doing it in 1v1 Mid seems like an easier place to start. It’s possible this just wasn’t a juicy enough headline and wouldn’t have netted them stage time at The International this year (and part of their job is to promote their work). But it’s also possible that hero selection explodes the problem space so much (see also: item and skill selection) that we don’t really know where to start.
5. Beating humans at DOTA 2 is either impossible, or totally trivial, depending on how you feel about it. Open AI explain that their bots have a reaction time of about 80ms, which they note is faster than a human’s. But it’s not just faster than a normal human reaction – clicking a key to react to an attack, for example. In that 80ms the system doesn’t just react to a single event – it can parse the entirety of the game map, from top to bottom, using a perfectly precise API that describes the game in a series of numbers. They’re not just fast, they’re processing more information than any human will ever be able to. Many tasks that humans struggle at – such as ‘armlet toggling’, which became a point of controversy this week as it emerged a pro player was using scripts to perform it – are as trivial to an AI as walking down a lane or hitting a creep.
A Short Note About Being Superhuman
Let me put this last point in perspective with a quick example: stacking stuns is a term used in DOTA 2 for when skills are used in a way that doesn’t maximise their potential. Suppose I have a spell that stuns you for 1 second, and another spell that stuns you for 2. If I use them one after another, perfectly, they stun you for 3 seconds. If I panic and use one too early, I might only get 2.5 seconds. Avoiding stacking stuns is really important in DOTA 2.
You used to have to count this out in your head, which was pretty hard (a lot of heroes have stuns of different lengths in DOTA 2, and patches often increase or decrease them by 0.2 of a second). One day, Valve added a stun indicator (see above) which counted down, so you could easily see how long there was left on the active stun. This was much nicer.
The community was in uproar, however, because this was seen as lowering the skill barriers. People who had spent a long time perfecting this felt that those who hadn’t learned were getting a free pass. This skill, a staple of DOTA 2 and a key ability for good players, is utterly trivial to an AI. They don’t have to count and guess in their head, or remember a number, they don’t even have to look at a bar – they receive a crisp, exact floating point number describing the number of seconds left on a stun, and can act within 80ms to cast another spell.
The reason I’m belabouring this point is because Open AI’s bot is already superhuman, even without playing the whole of DOTA 2. It’s unsurprising that a system with such perfect access to information and the ability to react quickly is able to outplay humans. The interesting things here are that it learned these basic skills, not how well it’s able to execute them. That part is very obvious – and depending on how you view it, isn’t really comparable to how humans play anyway. How good would a player like Dendi be if they could read the entire map and combat log for every 80ms of gameplay?
Games have always made for attractive AI milestones, but most of them have been in the physical world, and that’s made it easier to argue that the system and its human opponent is experiencing the game in the same way. For videogames this is much harder, something we already know from AI research into generating rules and levels. Agents don’t perceive games the same way that humans do, and that makes drawing comparisons between the two tricky. I feel like we probably need to re-examine the entire idea of humans playing against computers and what we actually read into it.
That’s all I’ve got – this turned out to be a good deal longer than planned, but hopefully it gave you some food for thought! I’m off to prepare for ICCC this week – follow me on Twitter for occasional conference updates, and donate to the PROCJAM Kickstarter to help us fund our 2018 jam activities!