Similar ideas
Hey! Great work on the original notebook. @mehdidc is working on a similar project now as well. I recommended your original CLIP Decision Transformer notebook; and came here to find this repository!
Here's where we were discussing things: https://github.com/mehdidc/feed_forward_vqgan_clip/issues/1
@mehdidc is using the captions I scraped from the Open AI dalle blog post.
They definitely seem to encourage compositional diversity and zero-shot style transfer. Could be useful to you if you don't have them already -
https://www.dropbox.com/s/p0qwhefid4p8q0u/blog_captions.tar.gz
There are about a million in there total; with a good deal of repetition. You might also see the occassional {} which will need to be cleaned.