Open-Sora-Plan icon indicating copy to clipboard operation
Open-Sora-Plan copied to clipboard

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Results 285 Open-Sora-Plan issues
Sort by recently updated
recently updated
newest added

Dear authors, thanks for your interesting work and plans. However, there is one question in my mind: why you choose to use VQVAE instead of VAE? As stated both in...

Hi there! I have been watching and contributing to the text2video ecosystem for a long time now. Now that Sora is out, there's more attention to the subject and I...

Thanks for the great work!!! I just found out the repo directory has changed and VideoGPT/ is moved to src/sora/modules/ae/vqvae/videogpt/. But the README is still same, cd VideoGPT. could u...

I tried A100 (40GB SXM4) with 30 vCPUs, 200 GiB RAM, 512 GiB SSD but immediately CUDA out of memory. which card / config shall i use? 8x A100 80GB?...

Hi, I'm using H100 (80GB) , but the specified pytorch version (torch==1.13.1+cu117) does not support H100 CUDA sm_90. Has anyone met h100 issue? how to fix it? Much thanks!! NVIDIA...

Hi, Thank you for releasing this, I noticed you mentioned that this is an “open source” project but the license is NC which doesn’t classify as an open source license....

Hi team members, I would attribute the success of SORA to the training data like how OpenAI has done for GPT. Any ideas on curating high-quality video data?

'noist' to 'noisy'

Please I would like to know if it's possible to integrate the model with nodeJS at all. Forgive my ignorance if this doesn't sound relevant. I'm trying to see if...

给Open-Sora-Plan兄弟们的一点建议: 1)与其仅仅跟随,不如顺带超越。技术性超越点:比如在基础版本上,增加大对象的检测和约束(正则),避免无中生有;建立一个关键词和属性表,增加对象的类型标定(刚体,准刚体,流体),增加刚体、准刚体之类的运动关键点检测,防止手穿越身体,用正则,用RLHF之类来尽可能增加物理合理性。(所有生成式模型,最后都拼的是对客观世界一致性的约束强弱),... 2)条件部分,从你们示意图可见,你们侧重图片的几个属性,个人建议:文字(caption越丰富越好,包括物体/场景对象组,包括运动,包括形容词) > 图片(原始照片 > 处理后图片)> UI操作数据 3)数据和算力很大,可以搞个赞助页面。赞助钱或者GPU云资源都行啊。 数据可以同步准备:特别是海量视频的caption。这里面应该很多人肉工作。在算法caption的基础上,还是需要人工去检查。需要提前规划,那些在视频生成的时候,用户可能会很care的那些关键字:相机位置(非常必要:比如导演心目中的镜头,机位移动轨迹,比如从人物背后跟随到绕道前方近距离人脸特写等,这个在一般视频caption中没这些;我做的立体中,这个是建模事后在线去确定),画面风格,影视术语,灯光材质,动作交互,... 4)Sora, Genie等出来后,我也当天就做了学习分析。比如: https://github.com/yuedajiong/super-ai/blob/main/superai-20240216-sora.png https://github.com/yuedajiong/super-ai 我主要精力在: ”立体,动态/交互,逼真,影级,复杂世界“ 的生成上。 本质上,我更在乎:利用显式的5D(dynamic,interactive)表示,强约束,来做vision generation。 如果技术上有用得到的,乐意加入一起写代码。 比如:逼真,在我的理解中,包含类别的逼真(人像人),更包含个体的逼真(刘亦菲像刘亦菲)。当我们文字描述:刘亦菲在跳舞,并且用了刘亦菲的照片的时候,产生的5D世界,或者sora的2D世界,确实一直就是刘亦菲的脸。对于影视创作(明星的IP),甚至video-fake等场景来说(这个就不用举例了,名人),这个一致性约束,非常重要。 5)一个超越Sora技术思考点分享: 用户输入文字的时候,是one-shot一把输入的;但生成的是视频。当涉及到关于“运动”的描述时,如何将“运动”文本条件分解。 大家都不知道sora的做法;我只是按照我的经验和理解来,在文本空间做一定处理,然后输入条件。我可能这么做: 比如: input: 一轮红日正在冉冉升起,中途一架飞机正好凌日飞过。 condition-processor-network: .... output:...