Open-Sora-Plan issues

VQVAE or VAE?

6

Dear authors, thanks for your interesting work and plans. However, there is one question in my mind: why you choose to use VQVAE instead of VAE? As stated both in...

rub2xr6vef

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

7

Hi there! I have been watching and contributing to the text2video ecosystem for a long time now. Now that Sora is out, there's more attention to the subject and I...

kabachuha

cd VideoGPT: No such file or directory

2

Thanks for the great work!!! I just found out the repo directory has changed and VideoGPT/ is moved to src/sora/modules/ae/vqvae/videogpt/. But the README is still same, cd VideoGPT. could u...

marvin-0042

What's HW requirement to run this model?

1

I tried A100 (40GB SXM4) with 30 vCPUs, 200 GiB RAM, 512 GiB SSD but immediately CUDA out of memory. which card / config shall i use? 8x A100 80GB?...

marvin-0042

NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation

2

Hi, I'm using H100 (80GB) , but the specified pytorch version (torch==1.13.1+cu117) does not support H100 CUDA sm_90. Has anyone met h100 issue? how to fix it? Much thanks!! NVIDIA...

marvin-0042

License

7

Hi, Thank you for releasing this, I noticed you mentioned that this is an “open source” project but the license is NC which doesn’t classify as an open source license....

fakerybakery

curating high-quality video data

10

Hi team members, I would attribute the success of SORA to the training data like how OpenAI has done for GPT. Any ideas on curating high-quality video data?

pengzhangzhi

Integrating with nodeJS

2

Please I would like to know if it's possible to integrate the model with nodeJS at all. Forgive my ignorance if this doesn't sound relevant. I'm trying to see if...

Tosey

给Open-Sora-Plan兄弟们的一点建议： 1）与其仅仅跟随，不如顺带超越。技术性超越点：比如在基础版本上，增加大对象的检测和约束（正则），避免无中生有；建立一个关键词和属性表，增加对象的类型标定（刚体，准刚体，流体），增加刚体、准刚体之类的运动关键点检测，防止手穿越身体，用正则，用RLHF之类来尽可能增加物理合理性。（所有生成式模型，最后都拼的是对客观世界一致性的约束强弱），... 2）条件部分，从你们示意图可见，你们侧重图片的几个属性，个人建议：文字（caption越丰富越好，包括物体/场景对象组，包括运动，包括形容词） > 图片（原始照片 > 处理后图片）> UI操作数据 3）数据和算力很大，可以搞个赞助页面。赞助钱或者GPU云资源都行啊。数据可以同步准备：特别是海量视频的caption。这里面应该很多人肉工作。在算法caption的基础上，还是需要人工去检查。需要提前规划，那些在视频生成的时候，用户可能会很care的那些关键字：相机位置（非常必要：比如导演心目中的镜头，机位移动轨迹，比如从人物背后跟随到绕道前方近距离人脸特写等，这个在一般视频caption中没这些；我做的立体中，这个是建模事后在线去确定），画面风格，影视术语，灯光材质，动作交互，... 4）Sora, Genie等出来后，我也当天就做了学习分析。比如： https://github.com/yuedajiong/super-ai/blob/main/superai-20240216-sora.png https://github.com/yuedajiong/super-ai 我主要精力在： ”立体，动态/交互，逼真，影级，复杂世界“ 的生成上。本质上，我更在乎：利用显式的5D(dynamic，interactive)表示，强约束，来做vision generation。如果技术上有用得到的，乐意加入一起写代码。比如：逼真，在我的理解中，包含类别的逼真（人像人），更包含个体的逼真（刘亦菲像刘亦菲）。当我们文字描述：刘亦菲在跳舞，并且用了刘亦菲的照片的时候，产生的5D世界，或者sora的2D世界，确实一直就是刘亦菲的脸。对于影视创作（明星的IP），甚至video-fake等场景来说(这个就不用举例了，名人)，这个一致性约束，非常重要。 5）一个超越Sora技术思考点分享：用户输入文字的时候，是one-shot一把输入的；但生成的是视频。当涉及到关于“运动”的描述时，如何将“运动”文本条件分解。大家都不知道sora的做法；我只是按照我的经验和理解来，在文本空间做一定处理，然后输入条件。我可能这么做：比如： input：一轮红日正在冉冉升起，中途一架飞机正好凌日飞过。 condition-processor-network: .... output:...

yuedajiong

Open-Sora-Plan
Open-Sora-Plan copied to clipboard

Metadata

VQVAE or VAE?

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

cd VideoGPT: No such file or directory

What's HW requirement to run this model?

NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation

License

curating high-quality video data

fix typo

Integrating with nodeJS

点赞，加油。一点正面建议。

← Metadata

Owner

Metadata

Open-Sora-Plan Open-Sora-Plan copied to clipboard

Metadata

← Metadata

Owner

Metadata

Open-Sora-Plan
Open-Sora-Plan copied to clipboard