Open-Sora-Plan
Open-Sora-Plan copied to clipboard
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Dear authors, thanks for your interesting work and plans. However, there is one question in my mind: why you choose to use VQVAE instead of VAE? As stated both in...
Hi there! I have been watching and contributing to the text2video ecosystem for a long time now. Now that Sora is out, there's more attention to the subject and I...
Thanks for the great work!!! I just found out the repo directory has changed and VideoGPT/ is moved to src/sora/modules/ae/vqvae/videogpt/. But the README is still same, cd VideoGPT. could u...
I tried A100 (40GB SXM4) with 30 vCPUs, 200 GiB RAM, 512 GiB SSD but immediately CUDA out of memory. which card / config shall i use? 8x A100 80GB?...
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation
Hi, I'm using H100 (80GB) , but the specified pytorch version (torch==1.13.1+cu117) does not support H100 CUDA sm_90. Has anyone met h100 issue? how to fix it? Much thanks!! NVIDIA...
License
Hi, Thank you for releasing this, I noticed you mentioned that this is an “open source” project but the license is NC which doesn’t classify as an open source license....
Hi team members, I would attribute the success of SORA to the training data like how OpenAI has done for GPT. Any ideas on curating high-quality video data?
'noist' to 'noisy'
Please I would like to know if it's possible to integrate the model with nodeJS at all. Forgive my ignorance if this doesn't sound relevant. I'm trying to see if...
给Open-Sora-Plan兄弟们的一点建议: 1)与其仅仅跟随,不如顺带超越。技术性超越点:比如在基础版本上,增加大对象的检测和约束(正则),避免无中生有;建立一个关键词和属性表,增加对象的类型标定(刚体,准刚体,流体),增加刚体、准刚体之类的运动关键点检测,防止手穿越身体,用正则,用RLHF之类来尽可能增加物理合理性。(所有生成式模型,最后都拼的是对客观世界一致性的约束强弱),... 2)条件部分,从你们示意图可见,你们侧重图片的几个属性,个人建议:文字(caption越丰富越好,包括物体/场景对象组,包括运动,包括形容词) > 图片(原始照片 > 处理后图片)> UI操作数据 3)数据和算力很大,可以搞个赞助页面。赞助钱或者GPU云资源都行啊。 数据可以同步准备:特别是海量视频的caption。这里面应该很多人肉工作。在算法caption的基础上,还是需要人工去检查。需要提前规划,那些在视频生成的时候,用户可能会很care的那些关键字:相机位置(非常必要:比如导演心目中的镜头,机位移动轨迹,比如从人物背后跟随到绕道前方近距离人脸特写等,这个在一般视频caption中没这些;我做的立体中,这个是建模事后在线去确定),画面风格,影视术语,灯光材质,动作交互,... 4)Sora, Genie等出来后,我也当天就做了学习分析。比如: https://github.com/yuedajiong/super-ai/blob/main/superai-20240216-sora.png https://github.com/yuedajiong/super-ai 我主要精力在: ”立体,动态/交互,逼真,影级,复杂世界“ 的生成上。 本质上,我更在乎:利用显式的5D(dynamic,interactive)表示,强约束,来做vision generation。 如果技术上有用得到的,乐意加入一起写代码。 比如:逼真,在我的理解中,包含类别的逼真(人像人),更包含个体的逼真(刘亦菲像刘亦菲)。当我们文字描述:刘亦菲在跳舞,并且用了刘亦菲的照片的时候,产生的5D世界,或者sora的2D世界,确实一直就是刘亦菲的脸。对于影视创作(明星的IP),甚至video-fake等场景来说(这个就不用举例了,名人),这个一致性约束,非常重要。 5)一个超越Sora技术思考点分享: 用户输入文字的时候,是one-shot一把输入的;但生成的是视频。当涉及到关于“运动”的描述时,如何将“运动”文本条件分解。 大家都不知道sora的做法;我只是按照我的经验和理解来,在文本空间做一定处理,然后输入条件。我可能这么做: 比如: input: 一轮红日正在冉冉升起,中途一架飞机正好凌日飞过。 condition-processor-network: .... output:...