Why I’m Remaking OpenAI Universe

OpenAI Universe was very exciting when it first came out. With Universe, it was suddenly possible to train AI to play Flash games, book flights online, and perform various other real-world tasks. Unfortunately, Universe never really took off in the AI world. I don’t know why others aren’t using Universe, but I can tell you why I’m not using it and why I am working hard to build an alternative.

Since Universe has a lot of moving pieces, I’ll give a basic overview of how it works. Every Universe environment runs an instance of Chrome inside a Docker container. An AI interacts with the environment using VNC, which can send keyboard and mouse inputs to Chrome and receive back screen images in real time. To train RL agents on Flash games, Universe uses OCR to read the score off the screen and send it back to the agent as a reward signal. A program in the container controls game flow by looking at the screen and waiting for menus and “game over” pages.

In my view, the biggest problem with Universe is that VNC and Flash need to run in real time. This means that any hiccups on your training machine (e.g. a CPU spike due to a software update) might suddenly change the frame rate at which your AI experiences its virtual environment. It also means that you can’t run Universe at all on a slow machine. This rules out many cloud-hosting instances, for example many EC2 t2 instances.

Since Flash games are hard to reverse-engineer, Universe has no choice but to look at the screen to figure out the state of a game. This approach isn’t very stable. Imagine a game where the “game over” screen fades in gradually, but the “Submit Highscore” button on said screen can be pressed before the fade is complete. If you are running a random agent (one which clicks randomly), it might hit “Submit Highscore” before Universe is able to detect the “game over” screen. The result is that a strange new screen will pop up instead of the “game over” screen, and the Universe infrastructure will crash. I talk more about this flaw in this unresolved Github issue. This is a real issue because most RL algorithms start by taking random actions to explore the environment.

On top of the problems I just mentioned, it seems that OpenAI has internally abandoned Universe. When OpenAI unveiled Universe six months ago, they said the following in their blog post:

In upcoming weeks, we’ll release our environment integration tools, so anyone can contribute new environment integrations.

I guess weeks turned into months turned into never. Those tools are still not available. In the same blog post, they also promised demonstration data:

We’re compiling a large dataset of human demonstrations on Universe environments, which will be released publicly.

That never happened either, which saddens me since I would have loved to get my hands on that data.

Enter μniverse, my alternative to OpenAI Universe. Unlike Universe, μniverse focuses on HTML5 games. Since HTML5 games are written in JavaScript, it is possible to inject new code into them. This makes it much easier (and more reliable) to extract the score and “game over” status from a game. Menu automation is also simpler for HTML5 games, since you can trigger event handlers directly without worrying about the UI. The biggest benefit of HTML5 games, though, is that you can spoof time.

All JavaScript applications use the same APIs for time: Date, setTimeout, requestAnimationFrame, etc. If you inject your own implementation of these APIs, you can completely control how fast time appears to pass. This is exactly what I do in μniverse. Using a time spoofing API, games in μniverse run as fast or as slow as you want them to. You have to explicitly tell the environment to advance time; it won’t happen under your nose. This means that the speed of your machine has no bearing on the AI’s perception of its environment.

In μniverse, it is easy to guarantee that all actions are “safe”. In other words, μniverse ensures that there is nothing the agent can do to escape or break its environment. For example, it is easy to disable things like the “Back to Main Menu” button present in many games. Universe attempted to do this by forbidding clicks at certain locations, but it is undocumented and not fully supported.

I’ve been adding games to μniverse for a few weeks now. Right now I’ve integrated 27 games, and I expect that number to keep rising (update Jun 26, 2017: up to 54 games now). As I go, I’ve been training AI agents on some of the new games. It’s very satisfying to see AI perform better than me at some games. Throughout this process, I have to say that μniverse has felt more stable than Universe (I’ve trained games using both systems). μniverse is also much faster in terms of environment boot times, but that’s just a bonus.

I’d like to close with a nerdy technical note. While μniverse still runs Chrome inside of Docker, it runs Chrome in the new headless mode. This means that VNC is not needed. Instead, μniverse uses the Chrome DevTools protocol to capture screenshots, trigger events, navigate to webpages, and inject JavaScript. Hooking directly into Chrome like this is simpler and much more stable. It feels like a very elegant solution, and it has proven to be so in practice.

8 thoughts on “Why I’m Remaking OpenAI Universe”

  1. Cannot you just openai gym for this? Isn’t precisely the point of universe to allow an RL agent control some software to which the only access one can have is video and keyboard?

  2. OpenAI Gym is simply a Python API for implementing and using RL environments. In itself, OpenAI Gym doesn’t have lots of games to use (although Gym does ship with some Atari games). OpenAI Universe actually uses OpenAI Gym to expose its API. This does not fix or change any of the problems with Universe, such as speed variation, endgame bugs, etc.

  3. I know most ML libraries are in Python, but I am also interested in the development of ML libraries in the browser, ie in Javascript and we are talking about HTML5 Games, do you think there could be a NodeJs or Javascript version of Universe?

  4. There’s no real reason to restrict ML to Python. In fact, I use Go for all my ML work, including µniverse. Writing language bindings is always a slight pain, but it’s always possible. Already, I have Python bindings for µniverse, and Node bindings wouldn’t be hard to add as well.

    However, while JavaScript might seem like the perfect fit for µniverse, there’s a few things to consider. µniverse manipulates headless Chrome so much (page re-loading, time spoofing, etc.) that you wouldn’t want any training code running in there. Rather, you’d want the agent code running in Node, separate from the actual environment instances (and ideally controlling multiple instances at once). In general, µniverse is designed so that the game, and only the game, are running in Chrome in an isolated Docker container. Putting training/agent code inside the container would not be ideal.

Comments are closed.