SORA - text to video model

https://openai.com/sora

The website is rather comprehensive. Heres a screenshot so you dont have to go to muskworld.

Previous text to video was total shit and only 1-2 seconds. This is really good and up to 1 minute per iteration. Imagine the story telling possibilities or the ADS.

Thoughts?

6 Likes

Its going to be used for porn. Just you watch.

7 Likes

IT HAD BETTER be used for porn!

9 Likes

On a serious note did no one watch Terminator?

:man_facepalming:

4 Likes

That better be a seriously detailed prompt. Only 60 seconds. Gonna need to get to the good part fast. :rofl:

3 Likes

Oh boy…
This is gonna get weird.
:rofl:
…as if some of our technology isn’t weird already.

1 Like

Thats the plan !:stuck_out_tongue_winking_eye:

On a fishing site I frequent a question was raised about a bird ID when I logged in this morning. Turns out the bird in question is a SORA.

First time I had ever heard of or seen pictures of a sora.

2 Likes

It isn’t creating video first. It is simulating a world and then recording the result to video format we can play back.

2 Likes

that’s actually completely innacurate. They are giving it way too much credit. It actually cannot consistently portray physics realistically, and in fact it is not running a physics simulation at all. It’s just 2D images with no simulation. There is no consistent world simulation, 3d modeling, or world rendering.

what it is actually doing is creating a visual portrayal of the prompt based on analysis of relative scale of similar depictions in its training set. it has persistence from frame to frame, and persistence for up to one minute.

Being based solely on visual relationships of scale, it can not consistently and realistically show depictions of physical phenomena across multiple instances. sand, breaking glass, someone drinking or eating, or fluid being poured from differently sized containers are all visuals that it would struggle to recreate with any physical acuracy.

So while they are telling the truth that visual representation of physics is “intuitive” and “implicit” that is only because it is baked into their data set. (Their training dataset also used video created in unreal video game engine with its built in physics simulations.)

Physics are innate to real video footage so the models replication of physical phenomena is basically only an artifact of image generation with consistency from frame to frame across a one minute video.

The model has no understanding of these phenomena, and technically, no ability to simulate them.