OpenAI releases text to video tool 'Sora' and instantly generates a wave of social media spam

A few hours ago OpenAI launched its text to video tool ‘Sora’ (from the Japanese word (空) for sky) and predictably AI fanatics spammed film makers forums on Reddit and YouTube claiming the film industry’s days are numbered (they really mean Hollywood Jews, they will not apply this wild idea to film makers in the rest of the world) and that the end of the camera is coming too.

Madness.

It should be noted first of all that those wild opinions are from mostly 40-50 year old men (the typical AI YouTuber demographic) stuck in front of a computer all day who aren’t film makers and don’t have a real passion for film or story telling. They're the people generating novels with ChatGPT and flooding Amazon with these synthetic books that nobody wants to buy or making AI art that doesn’t sell. They’re the people who sent ChatGPT generated short stories and Midjourney artwork to Clarkesworld Magazine and then got banned from ever submitting again.

Let’s clarify what we are seeing here. Sora is neat, I really like what it can do. Who can’t be impressed? But it is a VFX and CGI tool. The term ‘computer generated images’ is more applicable to generative AI than 3D modelling and animation ever was. I’m sure someone out there with a big budget and time might make something neat, but it would be great if people who don’t understand filmmaking or audiences would not jump to conclusions.

With demos of Sora you are seeing a curated sample of low res videos in which the prompter has very little control over the output. Like all generative AI tools it is impossible to predict the full output which means a lot of credits and compute time is wasted generating over and over again. If the people who were bowled over by Sora looked closer they would have noticed videos replete with errors, such as an Asian woman who sometimes had two left feet, street signs with nonsense logos, billboards featuring non-existent jibbersih hanzi/kanji characters, and so on.

These issues are not easily fixed as the possible combinations of elements that can make up a scene are infinite. Bugs in the output will be a persistent problem, just like bugs in all software can never be fully quashed. Then you have the biggest compute problem - trying to animate character performances and mouth movements, which requires real time feedback, audio sync and the equivalent of doing multiple takes and shoots to get a character’s performance exactly where a story teller wants it to be. The best film directors have had their actors do numerous takes to fine tune a performance and the same applies in 3D CGI and here in GenAI.

There are also no controls for real time camera movement, changing the angle of a camera or changing the focal length or depth of field. Prompting is the most inefficient and slowest way to do these things in a virtual environment.

Films like Chariots of Fire had some cuts in the same scene shot at 24fps, 72fps and over 200fps. In animé, different animation layers/cels are animated on ones, twos and threes in the same shot. Generative video lacks these fine controls for frame rates and keyframing making the output, compute demands and costs hard to predict.

Other issues with using Sora include colour grading. Film editors and colourists understand how important log/raw footage is for the colour grading process. Art directors are always requesting sudden changes and those changes need to be done in real time. GenAI tools like Sora can only output compressed videos and you can’t change anything until you see the output. If you ask Sora to generate a video again with a different colour scheme the video itself might not be the same as the last - objects and characters could be in different positions with different errors from the last generation.

After generating a compressed video it is much harder to change the grade or colour correct. The quality will deteriorate. That’s OK for social media on a phone screen, but not for cinema. On a big screen even the smallest errors and artefacts are distracting. You don’t want audiences walking out of a screening because of quality issues, not purchasing titles because of bad reviews, or asking for refunds.

Even The Guardian’s article on Sora was hyperbolic, stating that the tool could generate video ‘instantly’ never mind the fact that that no video content can be generated instantly and that the videos themselves are extremely low resolution and feature a large number or errors that OpenAI highlighted on Sora’s page ‘Sora might struggle with simulating the physics of a complex scene or understanding cause and effect in specific scenarios. Spatial details in a prompt may also be misinterpreted, and Sora may find precise descriptions of events over time challenging’. To fix those errors will require an extraordinary amount of engineering and an unspeakably large amount of compute power to generate video, especially in native 4K or 8K with HDR support.

The film industry isn’t going anywhere and neither is filming actors on locations and practical sets. People pay to watch actors (including motion captured) and that’s not changing. When it comes to consuming entertainment, media and literature we’re talking about a shared human cultural connection. ChatGPT generated books don’t sell well because readers want to connect with real authors.

I grew up transitioning from celluloid and paint to digital photography and software. I learn every new technology that comes around, but because I have lived through all these cycles and understand consumers from a business and fan perspective I never fall for hype.

I remember clearly in the late 90s when there was fear that CGI would replace actors. It was all over the media. Paul Newman had it written into his family estate that if technology ever allowed him to be revived after he passed way that permission would never be granted. He wanted his likeness to remain his own. In 2002 Andrew Niccol of Gattaca fame made a satire called ‘Simone’ about generative AI and synthetic actors.

It never happened of course, even though CGI at its best is excellent it is never good enough. All it did was complement film. James Cameron worked as hard as possible to make Avatar as lifelike as possible but even the sequel looks like a high resolution video game composited with film footage. Hard surfaces are much easier to recreate than organic lifeforms and even water.

No new technology completely displaces and replaces what came before it. Classical instruments weren’t killed by synthesisers and Logic Pro samples. Ebooks didn’t kill books. Streaming didn’t kill vinyl records (vinyl ended up outliving the mighty iPod!). The best film directors in the world still shoot on film. Things co-exist.

We were also told that CGI would replace all traditional animation. When interviewed about his anthology ‘Memories’ which used CGI for a number of difficult shots, the great mangaka and Akira director Katsuhiro Otomo said ‘Because I draw pictures, I don’t have a plan to move away from 2D to the main use of 3DCG. 3DCG anime is like animating dolls, so people like me who have thought through the use of drawings do not have much idea about it. To begin with, it’s certainly true that Japanese like pictures with ‘contour lines’. I don’t think 2D anime will be rendered entirely obsolete, but it will stay as one of a number of diverse choices.’

Over 25 years later 2D anime is more popular than ever and Otomo’s words were prophetic. Yet despite all the evidence, you can still find tech fetishists in software engineering who insist to you that 2D animation and film photography don’t exist now. They live in a bubble so tight and small they can’t see the world outside.

At the end of the day audiences and consumers decide what becomes successful or not. You can have the most outrageous technology in the world but if your content annoys the public it won’t sell. Look at Ghost in the Shell. In 2002 they did a CGI update of the anime classic. Fans hated it and will always prefer the original. A decade later they did a live action movie remake with even better CGI. Fans hated it that too, this time for a variety of reasons.

VFX tools like these video/image generators can be incorporated into your work and if you do it smartly then it is no different from when Ryuchi Sakamoto pioneered electronic music. He never abandoned playing a classical piano in front of an audience though, he did that until his last days and posthumously continues to perform on his piano in the mixed reality concert experience ‘Kagami’.

Learn everything, absorb what is useful, incorporate technology into traditional arts and crafts, reject hyperbole, lovingly handcraft things that people will love, and don’t spam. The better your work is, the more of yourself that you put into your work, the more fans will reward you for it. Generative AI will have a permanent image problem associated with spam, memes making fun of AI errors, trolls hiding behind AI to mock creative workers, misinformation and climate impacts. You won’t have this image problem.