Add support to open source video generation models
Problem Statement
Pipecat currently lacks support for text-to-video generation, limiting its capabilities in multimodal workflows. This gap forces users to rely on external tools for video generation, leading to fragmented workflows and reduced efficiency. As high-quality models like CogVideoX, Mochi-1, Allegro, and LTX-Video gain traction, the absence of native support in Pipecat prevents users from leveraging these advancements in creative and production pipelines.
Proposed Solution
Motivation
🚀 Feature Request: Add Support for Video Generation Models
Please consider adding support for the following video generation models, which are currently not supported by pipecat:
These models represent the latest advancements in text-to-video generation, and their support would significantly enhance Pipecat capabilities in the multimodal AI space.
💡 Additional Context
These models are gaining popularity in generative AI workflows for creating high-quality video content from text prompts. Integration with lmdeploy would enable more efficient inference and unlock a range of creative and industrial use cases.
Alternative Solutions
No response
Additional Context
No response
Would you be willing to help implement this feature?
- [x] Yes, I'd like to contribute
- [ ] No, I'm just suggesting
Pipecat is used primarily in real-time use cases. Is this real-time video generation? If not, how would you expect these to work in the context of Pipecat?
What is with bytedance/LatentSync? It works with voice input and is supposed to be real time. However there need to be a standard way for hosting those model.