Open-Sora
Open-Sora copied to clipboard
Using a quality base image as start
Hi there, was wondering I'm guessing this project aims for a text2video model, but what if we first use a better base model, and bring our own image , from midjourny for example, and have an image2video model that optimizes for consistency and efficiency... Would probably make the problem easier, and later on work on the text2image portion
Also, I'm guessing some tricks were used to make the sora demo, for example scenes that have good camera motion, were probably rendered in low fidelity, used as base for image gen, then processed with an ai enhancer + upscaler + consistency
Just sharing my thoughts, All the best!