jan
jan copied to clipboard
idea: Inline multimedia playback in conversational threads
Problem Statement
This feature will significantly enhance the user experience by providing instant, in-context review of generated audio and video content. Currently, users must navigate away from the workspace to view video results, disrupting the creative process and diminishing the impact of motion media generation. Adding inline multimedia support will enhance the existing integration of static image display via Markdown.
Feature Idea
- Seamless In-Line Playback: Implement HTML5 audio and video rendering functionality to allow direct playback of media files within the chat interface.
- Essential Support for Key Models: Directly address the output of powerful content creation models like Kling, Wan, Sora, and Veo.
- Align with Existing Image Support: Leverage the successful precedent of inline image display using Markdown (
), extending this functionality to video files for a consistent media experience.
- Streamlined Iteration: Eliminate the friction of external review, enabling users to instantly check a video, provide feedback, and make changes without leaving the conversation.
- High-Value Workflow Completion: Complete the last mile of the image-to-video and text-to-video workflow, transforming link outputs into immediate, high-impact media results.
Here's are screenshots showing how image generation can work well today using a Fal.ai MCP tool. Adding support for video would round this out very well. Note: the assistants often struggle with the fine points of tool usage. Success is very dependent on model and system prompt.