transformers Implement LlamaGen for Image Generation

Implement LlamaGen for Image Generation

Open ighoshsubho opened this issue 1 year ago • 11 comments

Feature request

Add support for LlamaGen, an autoregressive image generation model, to the Transformers library. LlamaGen applies the next-token prediction paradigm of large language models to visual generation.

Paper: https://arxiv.org/abs/2406.06525 Code: https://github.com/FoundationVision/LlamaGen

Key components to implement:

Image tokenizer
Autoregressive image generation model (based on Llama architecture)
Class-conditional and text-conditional image generation
Classifier-free guidance for sampling

Motivation

LlamaGen demonstrates that vanilla autoregressive models without vision-specific inductive biases can achieve state-of-the-art image generation performance. Implementing it in Transformers would enable easier experimentation and integration with existing language models.

Your contribution

I can help by contributing to this model, and provide examples and detailed explanations of the model architecture and training process if needed.

Oct 03 '24 05:10 ighoshsubho

transformers transformers copied to clipboard

Implement LlamaGen for Image Generation

Feature request

Motivation

Your contribution

transformers
transformers copied to clipboard