Add support for Janus model from DeepSeek AI

Open ighoshsubho opened this issue 1 year ago • 0 comments

Model description

Janus is an autoregressive framework that unifies multimodal understanding and generation. Unlike previous approaches that use a single visual encoder for both tasks, Janus decouples visual encoding into separate pathways while utilizing a unified transformer architecture for processing. This decoupling addresses the conflict between visual encoder roles in understanding and generation, enhancing flexibility and performance.

Key features:

Unified framework for multimodal understanding and generation
Decoupled visual encoding pathways
Single, unified transformer architecture for processing
Improved performance in multimodal understanding tasks
Flexibility to select optimal encoding methods for each component

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

The Janus model is developed by DeepSeek AI. Here are the relevant links for implementation:

Paper: Janus: Bridging the Gap Between Multimodal Understanding and Generation GitHub repository: deepseek-ai/Janus

Oct 18 '24 18:10 ighoshsubho