OpenAI Universe was very exciting when it first came out. With Universe, it was suddenly possible to train AI to play Flash games, book flights online, and perform various other real-world tasks. Unfortunately, Universe never really took off in the AI world. I don’t know why others aren’t using Universe, but I can tell you why I’m not using it and why I am working hard to build an alternative.
Since Universe has a lot of moving pieces, I’ll give a basic overview of how it works. Every Universe environment runs an instance of Chrome inside a Docker container. An AI interacts with the environment using VNC, which can send keyboard and mouse inputs to Chrome and receive back screen images in real time. To train RL agents on Flash games, Universe uses OCR to read the score off the screen and send it back to the agent as a reward signal. A program in the container controls game flow by looking at the screen and waiting for menus and “game over” pages.
In my view, the biggest problem with Universe is that VNC and Flash need to run in real time. This means that any hiccups on your training machine (e.g. a CPU spike due to a software update) might suddenly change the frame rate at which your AI experiences its virtual environment. It also means that you can’t run Universe at all on a slow machine. This rules out many cloud-hosting instances, for example many EC2 t2 instances.
Since Flash games are hard to reverse-engineer, Universe has no choice but to look at the screen to figure out the state of a game. This approach isn’t very stable. Imagine a game where the “game over” screen fades in gradually, but the “Submit Highscore” button on said screen can be pressed before the fade is complete. If you are running a random agent (one which clicks randomly), it might hit “Submit Highscore” before Universe is able to detect the “game over” screen. The result is that a strange new screen will pop up instead of the “game over” screen, and the Universe infrastructure will crash. I talk more about this flaw in this unresolved Github issue. This is a real issue because most RL algorithms start by taking random actions to explore the environment.
On top of the problems I just mentioned, it seems that OpenAI has internally abandoned Universe. When OpenAI unveiled Universe six months ago, they said the following in their blog post:
In upcoming weeks, we’ll release our environment integration tools, so anyone can contribute new environment integrations.
I guess weeks turned into months turned into never. Those tools are still not available. In the same blog post, they also promised demonstration data:
We’re compiling a large dataset of human demonstrations on Universe environments, which will be released publicly.
That never happened either, which saddens me since I would have loved to get my hands on that data.
Enter μniverse, my alternative to OpenAI Universe. Unlike Universe, μniverse focuses on HTML5 games. Since HTML5 games are written in JavaScript, it is possible to inject new code into them. This makes it much easier (and more reliable) to extract the score and “game over” status from a game. Menu automation is also simpler for HTML5 games, since you can trigger event handlers directly without worrying about the UI. The biggest benefit of HTML5 games, though, is that you can spoof time.
All JavaScript applications use the same APIs for time: Date, setTimeout, requestAnimationFrame, etc. If you inject your own implementation of these APIs, you can completely control how fast time appears to pass. This is exactly what I do in μniverse. Using a time spoofing API, games in μniverse run as fast or as slow as you want them to. You have to explicitly tell the environment to advance time; it won’t happen under your nose. This means that the speed of your machine has no bearing on the AI’s perception of its environment.
In μniverse, it is easy to guarantee that all actions are “safe”. In other words, μniverse ensures that there is nothing the agent can do to escape or break its environment. For example, it is easy to disable things like the “Back to Main Menu” button present in many games. Universe attempted to do this by forbidding clicks at certain locations, but it is undocumented and not fully supported.
I’ve been adding games to μniverse for a few weeks now. Right now I’ve integrated 27 games, and I expect that number to keep rising (update Jun 26, 2017: up to 54 games now). As I go, I’ve been training AI agents on some of the new games. It’s very satisfying to see AI perform better than me at some games. Throughout this process, I have to say that μniverse has felt more stable than Universe (I’ve trained games using both systems). μniverse is also much faster in terms of environment boot times, but that’s just a bonus.
I’d like to close with a nerdy technical note. While μniverse still runs Chrome inside of Docker, it runs Chrome in the new headless mode. This means that VNC is not needed. Instead, μniverse uses the Chrome DevTools protocol to capture screenshots, trigger events, navigate to webpages, and inject JavaScript. Hooking directly into Chrome like this is simpler and much more stable. It feels like a very elegant solution, and it has proven to be so in practice.