transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add support for Janus model from DeepSeek AI

Open ighoshsubho opened this issue 1 year ago • 0 comments

Model description

Janus is an autoregressive framework that unifies multimodal understanding and generation. Unlike previous approaches that use a single visual encoder for both tasks, Janus decouples visual encoding into separate pathways while utilizing a unified transformer architecture for processing. This decoupling addresses the conflict between visual encoder roles in understanding and generation, enhancing flexibility and performance.

Key features:

  • Unified framework for multimodal understanding and generation
  • Decoupled visual encoding pathways
  • Single, unified transformer architecture for processing
  • Improved performance in multimodal understanding tasks
  • Flexibility to select optimal encoding methods for each component

Open source status

  • [X] The model implementation is available
  • [X] The model weights are available

Provide useful links for the implementation

The Janus model is developed by DeepSeek AI. Here are the relevant links for implementation:

Paper: Janus: Bridging the Gap Between Multimodal Understanding and Generation GitHub repository: deepseek-ai/Janus

ighoshsubho avatar Oct 18 '24 18:10 ighoshsubho