LlamaGen icon indicating copy to clipboard operation
LlamaGen copied to clipboard

Mask guidance, inpaiting and outpaiting

Open sahil02235 opened this issue 1 year ago • 7 comments

Thanks for the awesome paper. Even the codebase is very easy to use. Can you please do some initial experiments on mask guidance image generation, inpainting and outpainting. It will really help the community.

sahil02235 avatar Jul 25 '24 07:07 sahil02235

The autoregreesive model acutally is not really good for mask guidance image generation in my opinion

daiyixiang666 avatar Jul 25 '24 08:07 daiyixiang666

@daiyixiang666 theoretically i don't understand why it should perform badly, depends on how you are doing the conditioning.

sahil02235 avatar Jul 25 '24 10:07 sahil02235

The autoregreesive model acutally is not really good for mask guidance image generation in my opinion

I think autoregreesive model perform better when align with language model

enjoyyi00 avatar Jul 25 '24 10:07 enjoyyi00

If the mask is just like casual mask I think it will be great, but I do not think we always has the casual mask in real life

daiyixiang666 avatar Jul 25 '24 13:07 daiyixiang666

@iFighting While it's true that autoregressive models can generate high-quality images without relying on text-based or mask-based conditioning, it's important to acknowledge that diffusion-based models like DiT have also demonstrated impressive results. However, diffusion models do face challenges when it comes to text-based and mask-based conditioning. Given that autoregressive models can generate high-quality images and handle text-based alignment effectively, a promising avenue for future research could be exploring different types of conditioning within these models. This would involve testing the accuracy of these new conditioning methods and addressing any limitations that arise.

sahil02235 avatar Jul 26 '24 21:07 sahil02235

For control generation (reference): https://arxiv.org/pdf/2406.09750

sahil02235 avatar Jul 26 '24 21:07 sahil02235

We are releasing code recently. If you are interested in controllable AR generation, please keep an eye on https://github.com/lxa9867/ControlVAR.

lxa9867 avatar Jul 27 '24 09:07 lxa9867