VFX artists show that Hollywood can use AI to create, not exploit
Hollywood may be embroiled in ongoing labor disputes that involve AI, but the technology infiltrated film and TV long, long ago. At SIGGRAPH in LA, algorithmic and generative tools were on display in countless talks and announcements. We may not know where the likes of GPT-4 and Stable Diffusion fit in yet, but the creative side of production is ready to embrace them — if it can be done in a way that augments rather than replaces artists.
SIGGRAPH isn't a film and TV production conference, but one about computer graphics and visual effects (for 50 years now!), and the topics naturally have overlapped more and more in recent years.
This year, the elephant in the room was the strike, and few presentations or talks got into it; however, at afterparties and networking events it was more or less the first thing anyone brought up. Even so, SIGGRAPH is very much a conference about bringing together technical and creative minds, and the vibe I got was "it sucks, but in the meantime we can continue to improve our craft."
The fears around AI in production are, not to say illusory, but certainly a bit misleading. Generative AI like image and text models have improved greatly, leading to worries that they will replace writers and artists. And certainly studio executives have floated harmful — and unrealistic — hopes of partly replacing writers and actors using AI tools. But AI has been present in film and TV for quite a while, performing important and artist-driven tasks.
I saw this on display in numerous panels, technical paper presentations and interviews. Of course a history of AI in VFX would be interesting, but for the present here are some ways AI in its various forms was being shown at the cutting edge of effects and production work.
Pixar's artists put ML and simulations to work
One early example came in a pair of Pixar presentations about animation techniques used in their latest film, Elemental. The characters in this movie are more abstract than others, and the prospect of making a person who is made of fire, water or air is no easy one. Imagine wrangling the fractal complexity of these substances into a body that can act and express itself clearly while still looking "real."
As animators and effects coordinators explained one after another, procedural generation was core to the process, simulating and parameterizing the flames or waves or vapors that made up dozens of characters. Hand sculpting and animating every little wisp of flame or cloud that wafts off a character was never an option — this would be extremely tedious, labor-intensive and technical rather than creative work.
But as the presentations made clear, although they relied heavily on sims and sophisticated material shaders to create the desired effects, the artistic team and process were deeply intertwined with the engineering side. (They also collaborated with researchers at ETH Zurich for the purpose.)
One example was the overall look of one of the main characters, Ember, who is made of flame. It wasn't enough to simulate flames or tweak the colors or adjust the many dials to affect the outcome. Ultimately the flames needed to reflect the look the artist wanted, not just the way flames appear in real life. To that end they employed "volumetric neural style transfer" or NST; style transfer is a machine learning technique most will have experienced by, say, having a selfie changed to the style of Edvard Munch or the like.
In this case the team took the raw voxels of the "pyro simulation," or generated flames, and passed it through a style transfer network trained on an artist's expression of what they wanted the character's flames to look like: more stylized, less simulated. The resulting voxels have the natural, unpredictable look of a simulation but also the unmistakable cast of the artist's choice.
Simplified example of NST in action adding style to Ember's flames. Image Credits: Pixar
Of course the animators are sensitive to the idea that they just generated the film using AI, which is not the case.
"If anyone ever tells you that Pixar used AI to make Elemental, that's wrong," said Pixar's Paul Kanyuk pointedly during the presentation. "We used volumetric NST to shape her silhouette edges."
(To be clear, NST is a machine learning technique we would identify as falling under the AI umbrella, but the point Kanyuk was making is that it was used as a tool to achieve an artistic outcome — nothing was simply "made with AI.")
Later, other members of the animation and design teams explained how they used procedural, generative or style transfer tools to do things like recolor a landscape to fit an artist's palette or mood board, or fill in city blocks with unique buildings mutated from "hero" hand-drawn ones. The clear theme was that AI and AI-adjacent tools were there to serve the purposes of the artists, speeding up tedious manual processes and providing a better match with the desired look.
AI accelerating dialogue
Images from Nimona, which DNEG animated. Image Credits: DNEG
I heard a similar note from Martine Bertrand, senior AI researcher at DNEG, the VFX and post-production outfit that most recently animated the excellent and visually stunning Nimona. She explained that many existing effects and production pipelines are incredibly labor-intensive, in particular look development and environment design. (DNEG also did a presentation, "Where Proceduralism Meets Performance" that touches on these topics.)
"People don't realize that there's an enormous amount of time wasted in the creation process," Bertrand told me. Working with a director to find the right look for a shot can take weeks per attempt, during which infrequent or bad communication often leads to those weeks of work being scrapped. It's incredibly frustrating, she continued, and AI is a great way to accelerate this and other processes that are nowhere near final products, but simply exploratory and general.
Artists using AI to multiply their efforts "enables dialogue between creators and directors," she said. Alien jungle, sure — but like this? Or like this? A mysterious cave, like this? Or like this? For a creator-led, visually complex story like Nimona, getting fast feedback is especially important. Wasting a week rendering a look that the director rejects a week later is a serious production delay.
In fact new levels of collaboration and interactivity are being achieved in early creative work like pre-visualization, as one talk by Sokrispy CEO Sam Wickert explained. His company was tasked with doing pre-vis for the outbreak scene at the very start of HBO's "The Last of Us" — a complex "oner" in a car with countless extras, camera movements and effects.
While the use of AI was limited in that more grounded scene, it's easy to see how improved voice synthesis, procedural environment generation and other tools could and did contribute to this increasingly tech-forward process.
Final shot, mocap data, mask and 3D environment generated by Wonder Studio. Image Credits: Wonder Studio
Wonder Dynamics, which was cited in several keynotes and presentations, offers another example of use of machine learning processes in production — entirely under the artists' control. Advanced scene and object recognition models parse normal footage and instantly replace human actors with 3D models, a process that once took weeks or months.
But as they told me a few months ago, the tasks they automate are not the creative ones — it's grueling rote (sometimes roto) labor that involves almost no creative decisions. "This doesn't disrupt what they're doing; it automates 80-90% of the objective VFX work and leaves them with the subjective work," co-founder Nikola Todorovic said then. I caught up with him and his co-founder, actor Tye Sheridan at SIGGRAPH, and they were enjoying being the toast of the town: it was clear that the industry was moving in the direction they had started off in years ago. (Incidentally, come see Sheridan on the AI stage at TechCrunch Disrupt in September.)
That said, the warnings of writers and actors striking are in no way being dismissed by the VFX community. They echo them, in fact, and their concerns are similar — if not quite as existential. For an actor, one's likeness or performance (or for a writer, one's imagination and voice) is one's livelihood, and the threat of it being appropriated and automated entirely is a terrifying one.
For artists elsewhere in the production process, the threat of automation is also real, and also more of a people problem than a technology one. Many people I spoke to agreed that bad decisions by uninformed leaders are the real problem.
"AI looks so smart that you may defer your decision-making process to the machine," said Bertrand. "And when humans defer their responsibilities to machines, that's where it gets scary."
If AI can be harnessed to enhance or streamline the creative process, such as by reducing time spent on repetitive tasks or enabling creators with smaller teams or budgets to match their better-resourced peers, it could be transformative. But if the creative process is seconded to AI, a path some executives seem keen to explore, then despite the technology already pervading Hollywood, the strikes will just be getting started.