Part 2: Generating the face of the Implementor
The Implementor
March 2020
Generating a synthetic colleague and learning about the bleeding edge of artificial intelligence
So-called generative models are some of the most exciting elements of artificial intelligence, partly because they are the main part of AI, producing amazing visuals, and partly because they hold the biggest promise for the future.
The most important AI breakthrough, in my opinion, is adversarial training (also called GAN for Generative Adversarial Networks) … This, and the variations that are now being proposed, is the most interesting idea in the last 10 years in machine learning.
Yann LeCun, Director of AI Research at Facebook and Professor at NYU
And what better way to explore the modern field of generative models than to produce a synthetic colleague. One with a CV and a face, ready to assist in projects anywhere in the world within any kind of industry! An Implementor!
The overall reason why we started out building a new synthetic colleague is because it’s fun. Also, when learning something new, it’s helpful to have a coherent goal to work towards, and the Implementor provides us with reasonably interesting data (CVs) to work with and a goal to work towards (a synthetic colleague).
But why exactly work on generative models? In addition to the obvious point that they produce the most striking and engaging output (fun pictures and text), there is also the very real point that this is where a lot of the bleeding edge of AI research is happening. More broadly, in the realm of unsupervised learning.
What I cannot create, I do not understand.
So said Richard Feynman, and it has been the inspiration for much of the research into generative models.
Fundamentally, a generative model works by being fed data, and then it learns to generate new data similar to the data it is fed. The key is that the model is vastly simpler than the raw data it is being fed, and thus it has to learn an efficient representation of the data. In other words, it has to learn the essence of the data in order to generate it.
This representation turns out to be important for almost any modern machine learning task. In order to classify whether an image is a cat or a dog, or whether an invoice is fraudulent or not, a machine needs a general understanding of the world. It needs to understand fur, snouts and ears, and it needs to understand language and syntax.
Usually, when we train a machine learning model, we train it “from scratch”. We initiate it randomly, so it knows nothing of the world, and then it begins to learn. This means that an image classification model first needs to learn black and white, then what an edge and a surface looks like, and only then can it move on to textures and shapes, to features like snouts and ears, and finally to concepts like cats and dogs.
This is an expensive process because a lot of time is spent on labelling images as cats and dogs or invoices as fraudulent or not. In addition, a lot of that information is “spent” on learning very basic things like linguistic syntax and features like what a texture is.
The generative model (or more broadly unsupervised pre-training) is as close as we get to a free lunch in the machine learning community. By first having a model learn to generate invoices or pictures of cats and dogs, we’ll learn an efficient representation of these concepts. It won’t be outright ready to tell us if a thing is a cat or a dog or a fraudulent invoice, but it will have some broad representation of the concept and merely require some “fine-tuning” to get good results.
In practical terms, this means that we can use cheap, unlabelled data at scale to “pre-train” the model to learn the efficient representation of the general concepts. We then only use expensive (human) labelled data for the final tuning of the model, greatly increasing the value creation of the data we have at hand.
This is a very exciting promise and a very good theory that so far has seen mixed results in practice. In this series, we’ll cover a very successful example (language modelling) and an example that, while exciting, has been overtaken by other methods before it was fully matured (image generation).