pipecat Add support to open source video generation models

Problem Statement

Pipecat currently lacks support for text-to-video generation, limiting its capabilities in multimodal workflows. This gap forces users to rely on external tools for video generation, leading to fragmented workflows and reduced efficiency. As high-quality models like CogVideoX, Mochi-1, Allegro, and LTX-Video gain traction, the absence of native support in Pipecat prevents users from leveraging these advancements in creative and production pipelines.

Proposed Solution

Motivation

🚀 Feature Request: Add Support for Video Generation Models

Please consider adding support for the following video generation models, which are currently not supported by pipecat:

These models represent the latest advancements in text-to-video generation, and their support would significantly enhance Pipecat capabilities in the multimodal AI space.

💡 Additional Context

These models are gaining popularity in generative AI workflows for creating high-quality video content from text prompts. Integration with lmdeploy would enable more efficient inference and unlock a range of creative and industrial use cases.

Alternative Solutions

No response

Additional Context

No response

Would you be willing to help implement this feature?

[x] Yes, I'd like to contribute
[ ] No, I'm just suggesting

May 12 '25 10:05 KaifAhmad1

Pipecat is used primarily in real-time use cases. Is this real-time video generation? If not, how would you expect these to work in the context of Pipecat?

May 16 '25 16:05 markbackman

What is with bytedance/LatentSync? It works with voice input and is supposed to be real time. However there need to be a standard way for hosting those model.

May 27 '25 09:05 Lebski