Currently continuing my deep dive into generative video filmmaking.

I’ve been getting very acquainted with the newest tools and services offered by companies such as Runway and the newcomer Luma. I’m augmenting them with ComfyUI which runs locally on my Mac and PC.

Everything I’ve said previously still holds true and will remain forever relevant. I’ll be doing a full write up once this deep dive is done.

As always I’m not anti “AI”. We need machine learning. I do not subscribe to the anti-AI feelings of some in the creative and artistic communities online. Used properly and responsibly these tools can be immensely useful for VFX work, cover shots, background plates and so on.

Like the brief experiment in my last blog post, generative video works better when you supply it with a still image, either your own fine art, drawing, CGI, photography or generated picture. By giving the models visual guidance along with prompts you can control them much better. However, the discard or error rate it still very high, something like 90% of the time you will either be frustrated with the output or laugh at the output. Future blooper reels really should include generative errors.

Even on those rare occasions when the model gives you something great, you’ll need to do a lot of clean up and editing to make it useful for story telling. As mentioned before, generative tools are VFX/CGI tools and that’s the approach to take with them.

For the time being I’m going to post below a comparison of Luma and Runway’s output. I tried to replicate a scene from John Woo’s classic film ‘A Better Tomorrow’ starring the inimitable Chow Yun-fat.

I did not supply a reference image to either service. I wanted to test pure prompting to see what kind of fidelity and errors they could create from a description alone. The prompt was:

‘a Hong Kong movie, night, outdoors, a Chinese man has bruised face, one side of his face has bandages, he is angry and arguing, he wears a blue jacket and pink shirt, close up angle’

Here are the results, Luma in the middle below original footage and Runway at the bottom.

Luma’s output had vigour! They even generated a motorcycle accident in the background! Of course, it’s distorted garbage that you couldn’t add a voice too. If you used some of the generative lip synching tools that are offered at the moment the footage would look even more comical. Pause the video at any point and the character’s face looks like a butcher’s worktop.

Runway’s output looks more realistic (except n the background, they added a drunk man in a gold shirt walking backwards), but it is bloodless, emotionless and the majority of instructions in the prompt were not followed. Runway refused to show blood, violence or anger and repeatedly flagged my instructions for asking.

If you were a film director and your actor, make up artists and stylists were this disobedient you would fire them. In the world of generative filmmaking however you have to exercise a lot of patience alongside wishful thinking and wasted money (in the form of fixed amount of credits these services supply you monthly).

We soon find out why Runway performed so badly at this task on their website:

If Runway's Gen3 could help storytellers tell stories with machine learning tools, they won't be able to tell a story like Fight Club, American History X, American Psycho, Takashi Miike's Audition or Takeshi Kitano's Sonatine.

The tools are far far far from ready for character work unless you only require the most minimal amount of movement with minimal or no dialogue. You will not be producing AI video with a synthetic Chow Yun-fat, Toshiro Mifune or Al Pacino in the foreseeable future (getting an audience to pay for that is another big problem).

Yesterday was better than this. Knowing the talent of John Woo and Chow Yun-fat it is quite possible that they only needed to film one take for the scene above. It would have been rehearsed in person and mentally many times, but getting the shot done would not have been like using a slot machine, which is how it feels to use video and image generators.

I will end this blog post with a more successful case. As mentioned above, if all you require is minimal movement and almost no dialogue, some interested things can be created.

The very short Wong Kar-wai inspired film I crafted below started locally on my computer with ComfyUI. After many iterations I took the best images into Photoshop for clean ups and fixes. Then took the images to Luma’s Dream Machine to add motion. After several iterations I took the footage with the least errors into After Effects and Photoshop for more clean ups, and then finally edited the shots in Premiere, adding Wong Kar-wai’s signature picture book slow motion, music and voice over.

For now I call it ‘Farewell My Robot Concubine’. There’s a little twist in the reveal.

It was a fun experiment and I have been thinking of extending it with more scenes, but damn those credits keep going down, down, down. In another blog post I’ll reveal more production details including bloopers.