OpenAI releases text to video tool 'Sora' and instantly generates a wave of social media spam

A few hours ago OpenAI launched its text to video tool ‘Sora’ (from the Japanese word (空) for sky) and predictably AI fanatics spammed film makers forums on Reddit and YouTube claiming the film industry’s days are numbered (they really mean Hollywood Jews, they will not apply this wild idea to film makers in the rest of the world) and that the end of the camera is coming too.

Madness.

It should be noted first of all that those wild opinions are from mostly 40-50 year old men (the typical AI YouTuber demographic) stuck in front of a computer all day who aren’t film makers and don’t have a real passion for film or story telling. They're the people generating novels with ChatGPT and flooding Amazon with these synthetic books that nobody wants to buy or making AI art that doesn’t sell. They’re the people who sent ChatGPT generated short stories and Midjourney artwork to Clarkesworld Magazine and then got banned from ever submitting again.

Let’s clarify what we are seeing here. Sora is neat, I really like what it can do. Who can’t be impressed? But it is a VFX and CGI tool. The term ‘computer generated images’ is more applicable to generative AI than 3D modelling and animation ever was. I’m sure someone out there with a big budget and time might make something neat, but it would be great if people who don’t understand filmmaking or audiences would not jump to conclusions.

With demos of Sora you are seeing a curated sample of low res videos in which the prompter has very little control over the output. Like all generative AI tools it is impossible to predict the full output which means a lot of credits and compute time is wasted generating over and over again. If the people who were bowled over by Sora looked closer they would have noticed videos replete with errors, such as an Asian woman who sometimes had two left feet, street signs with nonsense logos, billboards featuring non-existent jibbersih hanzi/kanji characters, and so on.

These issues are not easily fixed as the possible combinations of elements that can make up a scene are infinite. Bugs in the output will be a persistent problem, just like bugs in all software can never be fully quashed. Then you have the biggest compute problem - trying to animate character performances and mouth movements, which requires real time feedback, audio sync and the equivalent of doing multiple takes and shoots to get a character’s performance exactly where a story teller wants it to be. The best film directors have had their actors do numerous takes to fine tune a performance and the same applies in 3D CGI and here in GenAI.

There are also no controls for real time camera movement, changing the angle of a camera or changing the focal length or depth of field. Prompting is the most inefficient and slowest way to do these things in a virtual environment.

Films like Chariots of Fire had some cuts in the same scene shot at 24fps, 72fps and over 200fps. In animé, different animation layers/cels are animated on ones, twos and threes in the same shot. Generative video lacks these fine controls for frame rates and keyframing making the output, compute demands and costs hard to predict.

Other issues with using Sora include colour grading. Film editors and colourists understand how important log/raw footage is for the colour grading process. Art directors are always requesting sudden changes and those changes need to be done in real time. GenAI tools like Sora can only output compressed videos and you can’t change anything until you see the output. If you ask Sora to generate a video again with a different colour scheme the video itself might not be the same as the last - objects and characters could be in different positions with different errors from the last generation.

After generating a compressed video it is much harder to change the grade or colour correct. The quality will deteriorate. That’s OK for social media on a phone screen, but not for cinema. On a big screen even the smallest errors and artefacts are distracting. You don’t want audiences walking out of a screening because of quality issues, not purchasing titles because of bad reviews, or asking for refunds.

Even The Guardian’s article on Sora was hyperbolic, stating that the tool could generate video ‘instantly’ never mind the fact that that no video content can be generated instantly and that the videos themselves are extremely low resolution and feature a large number or errors that OpenAI highlighted on Sora’s page ‘Sora might struggle with simulating the physics of a complex scene or understanding cause and effect in specific scenarios. Spatial details in a prompt may also be misinterpreted, and Sora may find precise descriptions of events over time challenging’. To fix those errors will require an extraordinary amount of engineering and an unspeakably large amount of compute power to generate video, especially in native 4K or 8K with HDR support.

The film industry isn’t going anywhere and neither is filming actors on locations and practical sets. People pay to watch actors (including motion captured) and that’s not changing. When it comes to consuming entertainment, media and literature we’re talking about a shared human cultural connection. ChatGPT generated books don’t sell well because readers want to connect with real authors.

I grew up transitioning from celluloid and paint to digital photography and software. I learn every new technology that comes around, but because I have lived through all these cycles and understand consumers from a business and fan perspective I never fall for hype.

I remember clearly in the late 90s when there was fear that CGI would replace actors. It was all over the media. Paul Newman had it written into his family estate that if technology ever allowed him to be revived after he passed way that permission would never be granted. He wanted his likeness to remain his own. In 2002 Andrew Niccol of Gattaca fame made a satire called ‘Simone’ about generative AI and synthetic actors.

It never happened of course, even though CGI at its best is excellent it is never good enough. All it did was complement film. James Cameron worked as hard as possible to make Avatar as lifelike as possible but even the sequel looks like a high resolution video game composited with film footage. Hard surfaces are much easier to recreate than organic lifeforms and even water.

No new technology completely displaces and replaces what came before it. Classical instruments weren’t killed by synthesisers and Logic Pro samples. Ebooks didn’t kill books. Streaming didn’t kill vinyl records (vinyl ended up outliving the mighty iPod!). The best film directors in the world still shoot on film. Things co-exist.

We were also told that CGI would replace all traditional animation. When interviewed about his anthology ‘Memories’ which used CGI for a number of difficult shots, the great mangaka and Akira director Katsuhiro Otomo said ‘Because I draw pictures, I don’t have a plan to move away from 2D to the main use of 3DCG. 3DCG anime is like animating dolls, so people like me who have thought through the use of drawings do not have much idea about it. To begin with, it’s certainly true that Japanese like pictures with ‘contour lines’. I don’t think 2D anime will be rendered entirely obsolete, but it will stay as one of a number of diverse choices.’

Over 25 years later 2D anime is more popular than ever and Otomo’s words were prophetic. Yet despite all the evidence, you can still find tech fetishists in software engineering who insist to you that 2D animation and film photography don’t exist now. They live in a bubble so tight and small they can’t see the world outside.

At the end of the day audiences and consumers decide what becomes successful or not. You can have the most outrageous technology in the world but if your content annoys the public it won’t sell. Look at Ghost in the Shell. In 2002 they did a CGI update of the anime classic. Fans hated it and will always prefer the original. A decade later they did a live action movie remake with even better CGI. Fans hated it that too, this time for a variety of reasons.

VFX tools like these video/image generators can be incorporated into your work and if you do it smartly then it is no different from when Ryuchi Sakamoto pioneered electronic music and with the Roland MC-8 Microcomposer he introduced automated synthesiser playback.

Sakamoto never abandoned playing a classical piano in front of an audience though and as a classical composer he scored films such as ‘The Last Emperor’ and ‘Merry Christmas, Mr Lawrence’. He posthumously continues to perform on a grand piano in the mixed reality concert experience ‘Kagami’, combining the classic and the virtual even after he passed away.

Learn everything, absorb what is useful, incorporate technology into traditional arts and crafts, reject hyperbole, lovingly handcraft things that people will love, and don’t spam. The better your work is, the more of yourself that you put into your work, the more fans will reward you for it. Generative AI will have a permanent image problem associated with spam, memes making fun of AI errors, trolls hiding behind AI to mock creative workers, misinformation and climate impacts. You won’t have this image problem.

Claims that AI will replace or displace most jobs are bogus

We are being told by finance bros, Twitter cretins and LinkedIn lunatics that AI (a buzzword that can mean almost anything now) will displace or replace anywhere between 40-90% of workers, enhance our productivity and make us more efficient, that it will free us up from hard work and give us more leisure time, and even more absurdly, that it will create a world of abundance for all.

Let’s break those claims down.

  1. It is impossible for “AI” or robots to do tasks that require the level of dexterity and flexibility that only the human mind and musculoskeletal system can pull off, even ChatGPT says it would be impossible. In real life, unlike science fiction, robots need a degree of bulk to be stable and are not good at self maintenance. AI, being software, will always be buggy and the more tasks you try to teach a system the more buggy and resource hungry it becomes. If robots and AI could displace significant numbers of workers it would come with reduced reliability, reduced dexterity and increased unpredictability in many fields.

  2. Microsoft, Meta and Google are talking up a big game about AI, but just take a look at the state of their platforms and Windows 11’s bloated and buggy condition. Google and YouTube are happy to host fake and scam ads so their moderation tools are failing to detect wrongful activity, unless they are allowing it. Instagram is infested with bots and sex pests - their AI moderation doesn’t protect users. Windows 11 still randomly crashes, looks like it was designed by a 12 year old in Microsoft Paint and users are already trying to uninstall or remove the Bing Chat bloat. That’s an operating system under development for almost five decades and it is a mess, so it’s not hard to imagine how bloated and buggy their future Godputer would be like in practice. Keep it far away from military bases and defence departments.

  3. The people making such claims don’t have domain expertise of all the jobs and sectors they are talking about. They are salesmen, newsletter shills and report writers who attach themselves to whatever the latest trend is. Some of them were raised by house servants or at the top of a caste system, so they never learned to respect working people anyway. If their reports and posts are super optimistic and buzzwordy, and if they fail to mention the technical limitations and implementation problems of any new technologies, it’s because they don’t really know what they’re talking about. They’re no different to Deepak Chopra talking about quantum physics.

  4. Many sectors are already operating at peak efficiency. We know that because we produce far more goods than we need and generate tons of food, electronic and clothing waste. We actually need to produce less, produce on demand, have more local production, higher quality more expensive long lasting goods, more locally repairable goods, and more just-in-time production and shipping. That’s something the polluting Sheins of the world don’t want to hear about. AI does’t solve this problem. Human willpower and cooperation solves this problem. An AI can suggest people take action against over production, slavery and pollution (things we already know) but it requires actual people to make the decisions and do it.

  5. Whenever sectors do use automation to increase efficiency, the time saved is filled up again by producing more goods, more content, more projects and expanding product lines. Employees do not end up working less, just differently. This is exactly what we have seen in creative workflows. As a production and post-production creative, I have used and implemented everything from Photoshop Actions to machine learning based tools in our workflows to speed up work and reduce mental stress. The result of efficiency gains allowed companies to ask us to produce more content. A decade ago we used to produce about 3 images per product. Today we are likely to produce up to 6 images per product and an optional video. Implementing machine learning and automation isn’t plain sailing either and often comes with bugs that are never fully resolved.

  6. At the pharmacy where my brother works they recently installed a state of the art robot for stock tracking and dispensing drugs. It didn’t displace any workers and requires onsite and remote support whenever there’s a hardware or software issue. That’s just how robots are.

  7. “AI will free up our time so we can create art.” Not everyone wants to create art, but remember that one going around on the socials? It didn’t age well considering the web is now being spammed by AI art that anyone can produce and often rips off the styles of well known artists. Generative art looks attractive at first sight because our visual cortex is experiencing something ‘new’ but within moments a kind of dire existential dread sinks in, similar to when you get a robocall or a bot sends you a DM. The images are bland, lifeless and have a ghoulish vibe to them. Apparently even AI systems prefer real art and real photography to the generative kind because when they are fed only generative art their abilities begin to decay.

  8. “Generative AI democratises artistic creation” is another one we sometimes hear, and at first glance it appears to be a true statement, but democratisation of content creation already exists with the plethora of options available. With generative AI (especially in the cloud) you are getting centralisation, it enriches chip makers, rent seekers and energy companies. It pushes up the cost of living while pushing down the cost of labour. It encourages talentless soulless executives and shareholders to tell artists their skills are near worthless. It reduces the quality of creative production and increases the ease of which spam can now be generated. It contributes not only to the enshitification of the web but the whole human experience. LinkedIn is full of 40-something men who generate images of women of colour (aka they’re saying please don’t give work to real women of colour) and claim they are part of a youth movement democratising content creation. They tried the same trick with VR, the metaverse, NFTs and crypto, by claiming they were democratising finance and being inclusive. They were largely rejected by society and then pivoted to AI, after they had caused millions of people to lose money.

  9. We could already have a world of abundance. It is the wealthiest who rig economies and create artificial scarcity in order to drive their wealth higher. They’re doing the same thing with AI by driving up the cost of using software or playing games, increasing energy consumption and making energy scarcer, increasing pollution, and hanging the threat of AI over the heads of workers to scare them into complacency and obedience.

  10. Driving a car is something a teenager learns to do and for the rest of our lives we mostly drive sub-consciously because we rely on known routes, laws and landmarks to assist us. It’s when laws are ignored that bad things happen. The best full self driving offered still can’t consistently perform on the level of a law abiding driver. If AI can’t actually drive cars yet (and in some parts of the world it won’t be possible at all) despite 40+ years of development, then AI won’t be able to do all those jobs that require a lot more complex real time reasoning, fluid thinking and dynamic responses than driving requires.

  11. If the consuming public were happy with bots replacing people, then athletes and sporting events would have been replaced by bots and virtual sports already. Why spend $20 million on a footballer when a team of computer controlled footballers can play sponsored advert filled virtual matches? Golf is an extremely wasteful and inefficient use of land and resources. Why can’t that end and be replaced with virtual golf? Big Blue beat Garry Kasparov at chess in 1997. Fast forward 27 years, there's still no audience to watch AI chess players play against each other in AI chess tournaments. The technology has existed for years, but consumers (fans) won’t pay for that. They will pay to watch real athletes struggle to win. Likewise, consumers will always pay more to read books written by real people, not chatbots. They want to build an emotional connection with the author, visit the author at a meet up, and get a signed copy of the book. A book is not just words on pages.

  12. Finally, if the cost of producing something, whether it is art, literature or clothing, is closer and closer to nothing then there’s little incentive for customers to want to pay you good money for whatever you are offering. Your offerings are a McDonald’s Happy Meal at this point, or worse. The world’s economy can’t be made up full of Happy Meal and fast fashion equivalents. Every sector depends on diversification of goods and services, from high end and artisanal to the low end mass produced.

I end this blog entry with a video of a delightful lady who runs one of Tokyo’s many popular food joints ‘Onigiri Bongo’. Japan already has a few restaurants with robot staff (they are gimmicks), but a robot cannot make a thousand onigiri a day without health and safety hazards and causing a mess, cannot build a rapport with customers and cannot make customers wait in a line outside for an hour every day. Connections, traditions and craft are important.

When I started my first novel Scrivener hadn’t been released yet.

Writing this science fiction novel took me 18 years of reading and research. Scrivener came out after I began working on it and over the years was so helpful and indispensable for managing all the notes and ideas.

Sometimes I would take an hiatus to read and research other things. Many ideas and scenes were revised or scrapped during those years but the central theme remained constant. I not only wanted the story to be ahead of its time but also contemporary enough to be relatable, so my bookmarks and notes kept growing and growing.

Finally I decided there was nothing left to study. The novel will be finished this summer. There will also be concept designs and artwork to accompany it.

Thanks to Keith of Literature & Latte for helping me stay organised for so long.

Recovered my old film school DV tape from 2001

Today I managed to finally capture my old film school DV tape. The tape had travelled with me for almost 20 years, from flat to flat and from country to country. I thought it wouldn’t have survived after so long. Tape degrades.

I was about to capture the tape in early 2020 but then covid came along and delayed those plans. I didn’t want to ask someone to capture it for me. I really wanted to enjoy the process of capturing tape just like I did when I was young. In fact, the first time I ever captured video computers weren't powerful enough to transfer live video from a camera. Computers needed a special targa capture card to import each frame individually as a targa sequence.

Finally covid subsided and after keeping an eye on eBay for a long time I found a Canon XM2 in excellent ‘almost new’ condition at a great price. The short film itself was filmed on the XM2’s big brother the XL1S, but the cameras are very similar internally. Video capture is somewhat similar to film scanning. You grab a coffee, set up the equipment, and then diligently perform the job of transferring media manually into the computer.

Connecting the XM2 for capture proved tricky. First, I had to use an old Mac with firewire. Second, Adobe Premiere stopped supporting miniDV capture a few years ago and there was no method to install a version of Premiere old enough that did still support miniDV. QuickTime still does allow firewire capture, but I discovered that the start of the tape had degraded from exposure to air and heat. Because of the damage to the tape, Quicktime was unable to capture video with audio, but it could capture the streams separately!

Capture done, you can see how little detail and resolution we worked with in those days. Imagine if we had 4K or 6K HDR cameras at the time! 🤯 The colours produced by Canon’s 3CCD system were great though. There is hardly any grading applied to the images below, in some scenes none at all.

Screenshots from ‘花’ (‘Hana’) a short film in Japanese that I wrote, shot and directed in film school back in 2001 starring my friend Ryoko who was a news reporter on Nihon TV at the time.

I made a vertical trailer for social media which can be watched below.



Getting it right requires time...and feedback

Even though I've been designing a completely new type of camera system that will be years ahead of anything that currently exists, my favourite camera will probably always be my Leica M3, a custom version with a genuine Italian rosewood body. Whenever I take it to camera shops for servicing it always receives the same compliment 'It's a unique piece!'

Leica M3 (photo taken with an iPhone :p)

I've used just about every type of camera over the years, but what attracted me to the M3 was that Leica put over a decade of research into it because they wanted to make sure they got the M series just right with the very first release (an extremely rare feat for any device). 

Over that decade, Leica frequently interacted with customers to help them design the M3. Users wanted it to be streamlined and ergonomic compared to the irksome and intimidating Leica III. Because the principal market was street photography, the M3 also had to feel second nature to users so that they could quickly capture moments around them. It also had to be easy to repair and recycle.

It was one of the earliest examples of a company asking for global customer feedback and beta testing the hell out of the product. The incredible results of that collaboration haven't been replicated so well since. Evidence of that can be seen in the fact that people still enjoy using the M3 65 years later.

Only one change was made during the M3's life-cycle - a change from a double stroke to single stroke advance lever (both equally useful). As time went by, Leica added a few more bells and whistles to the M series that weren't possible in the early 1950s, and sometimes they made mistakes in doing so, but the tradition of keeping their renowned product line pure still exists today in the M10.