Data and Machines – Pickled ML

Suppose the year is 2000 BC and you want to set eyes upon a specific, non-naturally occurring color. To create this color, you must look for the right combination of plants, sand, soils, and clay to produce paint–and even then, there is no guarantee you will succeed exactly. Today, you can search “color picker” on the internet and find a six-digit hex code representing the color in question. Not only can your phone display the color corresponding to the hex code–you can also send this hex code to anybody on Earth and they will see nearly the exact same color. You can just as easily manipulate the hex code, making it darker or lighter, inverting it, or making it “more green”. What’s more, you can use a camera to extract hex codes from natural images and manipulate these in exactly the same way as any other hex code.

What I’ve described is what happens when powerful machines meet powerful data structures. A data structure, in this case a hex code, is useless on its own. You could easily write down “#65bcd4” 200 years ago, but it would not be usable like it is today. The opposite is also true: it would be very difficult to create computer screens or cameras without defining an abstract data structure to represent color. The benefit of data structures is that they give us–and machines–an easy way to manipulate and duplicate state from the physical world.

For most of recorded history, humans were the main machines that operated on data structures. Language is perhaps the biggest example of structured data that is arguably useless on its own, but incredibly powerful when paired with the right machines. Another example is music notation (with the accompanying abstractions such as notes and units of time), where humans (often paired with instruments) are the decoding machines. However, things are changing, and non-human machines can now read, write, and manipulate many forms of data that were once exclusively human endeavors.

The right combination of machines and data structures can be truly revolutionary. For example, the ability to capture images and videos from the world, manipulate them in abstract form within a computer program, and then display the resulting imagery on a screen has transformed how people live their lives.

A good data structure can even shape how people think about the physical world. For example, defining colors in terms of three basis components should not obviously work, and requires knowledge of how the human eye perceives light. But now that we have done this research, any human who learns about the XYZ color space will have a clear picture of exactly which human-perceivable colors can be created and how they relate to one another.

It’s worth noting that data structures paired with powerful machines are not always revolutionary, at least not right away. What if, for example, we tried to do the same thing with smell as we did for color: create a data structure for representing smells using base components, and then build machines for producing and detecting smells using our new data structure. Well, this has been tried many times, but has yet to break into the average person’s household. The reasons to me are a bit unclear, but it doesn’t appear to be a pure machine or data structure bottleneck. Sure, we cannot build the right machines, but we also don’t understand the human olfactory system enough to define a perfect smell data structure.

There are also plenty of examples of existing–although not rigorously defined–data structures that could be brought to life with the right machines. Imagine food 3D printers that can follow a human-written recipe, or machines that apply a full face of makeup to match a photograph. I think millions of people would spend money to download a file and immediately have a face of makeup that looks exactly like Lady Gaga on the red carpet. And these same customers would probably also love to be able to tell Alexa to add more salt to last night’s dinner and print it again next weekend. Here, I think the main bottleneck is that we simply don’t have the machines yet; the data structures themselves are fun and possibly even easy to dream up.

I’d argue that, most of the time, we already have the data structures; the problem is that humans are the only machines that can operate on them. For these sorts of problems, machine learning and robotics may one day offer a reasonable solution. Humans are still the machines making recipes, putting on makeup, cutting hair, etc. The data structures we use for these tasks are often encoded in natural language or images, and putting them into physical form requires dextrous manipulation and intelligence. Even decoding these concepts from the world is sometimes out of reach of current machines (e.g. “describe the haircut in this photograph”). The last example also hints that machine learning may also make it easier to translate between different, largely equivalent data structures.

The end result of this mechanization will be truly amazing, and perhaps frightening. Imagine a world, not so long from now, where things we today consider “uncopyable” or “analog” become digital and easily manipulatable. These could include haircuts, food, physical objects, or even memories and personalities. This seems to be the natural conclusion of technological advancement, and I’m excited and horrified to witness some of it in my lifetime.