transformers
transformers copied to clipboard
Add support for Janus model from DeepSeek AI
Model description
Janus is an autoregressive framework that unifies multimodal understanding and generation. Unlike previous approaches that use a single visual encoder for both tasks, Janus decouples visual encoding into separate pathways while utilizing a unified transformer architecture for processing. This decoupling addresses the conflict between visual encoder roles in understanding and generation, enhancing flexibility and performance.
Key features:
- Unified framework for multimodal understanding and generation
- Decoupled visual encoding pathways
- Single, unified transformer architecture for processing
- Improved performance in multimodal understanding tasks
- Flexibility to select optimal encoding methods for each component
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
The Janus model is developed by DeepSeek AI. Here are the relevant links for implementation:
Paper: Janus: Bridging the Gap Between Multimodal Understanding and Generation GitHub repository: deepseek-ai/Janus