Mehdi Cherti

Results 51 comments of Mehdi Cherti

" does that mean there's no 500x iterations to get a good looking image?" Yes

Following the tweet you mentioned above, here is an example with "deviantart, volcano": https://imgur.com/a/cYMsNo5 with a model currently being trained on conceptual captions 12m.

@johndpope I added a bunch of pre-trained models if you want to give it a try

Hi @s13kman, thanks for your interest! I would suggest to use multi-gpu training to speed up training since you have access to multiple GPUs. Actually multi-gpu is supported through Horovod...

Hi @CrossLee1 sorry for replying until now, so it takes around 6 hours, but I train them on 64 A100 GPUs (data parallel with Horovod) to speed up the process....

Yes I see exactly what you mean and noticed this in all the models I trained (both VitGAN and mlp_mixer), this was the reason why I started by the way...

It could also be related to the architectures themselves (VitGAN and mlp_mixer), not sure. Another way to make the constraint even more explicit is to add a diversity loss on...

"For instance - the "photo of san francisco" captions tend to produce wildly different outputs " Ah okay, so what are the text prompts here where you observed different outputs,...

Another way to see is through interpolation, here is an video showing interpolation (of text encoded features) from "the sun is a vanishing point absorbing shadows" to "the moon is...