OpenAI Universe was very exciting when it first came out. With Universe, it was suddenly possible to train AI to play Flash games, book flights online, and perform various other real-world tasks. Unfortunately, Universe never really took off in the AI world. I don’t know why others aren’t using Universe, but I can tell you why I’m not using it and why I am working hard to build an alternative.
Since Universe has a lot of moving pieces, I’ll give a basic overview of how it works. Every Universe environment runs an instance of Chrome inside a Docker container. An AI interacts with the environment using VNC, which can send keyboard and mouse inputs to Chrome and receive back screen images in real time. To train RL agents on Flash games, Universe uses OCR to read the score off the screen and send it back to the agent as a reward signal. A program in the container controls game flow by looking at the screen and waiting for menus and “game over” pages.
In my view, the biggest problem with Universe is that VNC and Flash need to run in real time. This means that any hiccups on your training machine (e.g. a CPU spike due to a software update) might suddenly change the frame rate at which your AI experiences its virtual environment. It also means that you can’t run Universe at all on a slow machine. This rules out many cloud-hosting instances, for example many EC2 t2 instances.
Since Flash games are hard to reverse-engineer, Universe has no choice but to look at the screen to figure out the state of a game. This approach isn’t very stable. Imagine a game where the “game over” screen fades in gradually, but the “Submit Highscore” button on said screen can be pressed before the fade is complete. If you are running a random agent (one which clicks randomly), it might hit “Submit Highscore” before Universe is able to detect the “game over” screen. The result is that a strange new screen will pop up instead of the “game over” screen, and the Universe infrastructure will crash. I talk more about this flaw in this unresolved Github issue. This is a real issue because most RL algorithms start by taking random actions to explore the environment.
On top of the problems I just mentioned, it seems that OpenAI has internally abandoned Universe. When OpenAI unveiled Universe six months ago, they said the following in their blog post:
In upcoming weeks, we’ll release our environment integration tools, so anyone can contribute new environment integrations.
I guess weeks turned into months turned into never. Those tools are still not available. In the same blog post, they also promised demonstration data:
We’re compiling a large dataset of human demonstrations on Universe environments, which will be released publicly.
That never happened either, which saddens me since I would have loved to get my hands on that data.
In μniverse, it is easy to guarantee that all actions are “safe”. In other words, μniverse ensures that there is nothing the agent can do to escape or break its environment. For example, it is easy to disable things like the “Back to Main Menu” button present in many games. Universe attempted to do this by forbidding clicks at certain locations, but it is undocumented and not fully supported.
I’ve been adding games to μniverse for a few weeks now. Right now I’ve integrated 27 games, and I expect that number to keep rising (update Jun 26, 2017: up to 54 games now). As I go, I’ve been training AI agents on some of the new games. It’s very satisfying to see AI perform better than me at some games. Throughout this process, I have to say that μniverse has felt more stable than Universe (I’ve trained games using both systems). μniverse is also much faster in terms of environment boot times, but that’s just a bonus.