One of the most interesting sessions at Adobe MAX is traditionally the Sneaks keynote, where engineers from the company's various units show off their most cutting-edge work. Sometimes, those turn into products. Sometimes they don't. These days, a lot of the work focuses on AI, often based on the Adobe Sensei platform. This year, the company gave us an early look at Project Sweet Talk, one of the featured sneaks of tonight's event.
The idea here is pretty straightforward, but hard to pull off: take a portrait, either a drawing or a painting, identify the different parts of the face, then animate the mouth in sync with a voice-over. Today, Adobe's Character Animator (which you may have seen on shows like The Late Show with Stephen Colbert) does some of that, but it's limited in the number of animations, and the result, even in the hands of the best animators, doesn't always look all that realistic (as far as that's possible for the kind of drawings you animate in the product). Project Sweet Talk is far smarter. It analyzes the voice-over and then uses its AI smarts to realistically animate the character's mouth and head.
The team, lead by Adobe Researcher Dingzeyu Li, together with Yang Zhou (University of Massachusetts, Amherst) and Jose Echevarria and Eli Schectman (Adobe Research), actually fed their model with thousands of hours of video of real people talking to the camera on YouTube. Surprisingly, that model transferred really well to drawing and paintings -- even though the faces the team worked with, including pretty basic drawings of animal faces, don't really look like human faces.
"Animation is hard and we all know this," Li told me. "If we all know that if we want to align a face with a given audio track, it is even harder. Adobe Charter Animator already has a feature called 'compute lip sync' from scene audio,' and that shows you what the limitations are." The existing feature in Character Animator only moves the mouth, while everything else remains static. That's obviously not a very realistic look. If you look at the examples embedded in this post, you'll see that the team smartly warps the faces automatically to make them look more realistic -- all from a basic JPG image.
Because it does this face warping, Project Sweet Talk doesn't work all that well on photos. They just wouldn't look right -- and it also means there's no need to worry about anybody abusing this project for deepfakes. "To generate a realistic-looking deepfake, a lot of training data is needed," Li told me. "In our case, we only focus on the landmarks, which can be predicted from images -- and landmarks are sufficient to animate animations. But in our experiments, we find that landmarks alone are not enough to generate a realistic-looking [animation based on] photos."
Chances are, Adobe will build this feature into Character Animator in the long run. Li also tells me that building a real-time system -- similar to what's possible in Character Animator today -- is high on the team's priority list.