Clay Mullis
Clay Mullis
@mehdidc interesting so this happens with mlp-mixer too? Hm. There are of course examples to this which sort of defy the notion that it's a universal problem - if the...
@mehdidc Indeed - but I saw very similar results from the `prepend all captions with minimalism` checkpoint and I believe it would still happened with no modifications to the captions.
Generating a video from the training samples is maybe the best way to illustrate the issue: ```sh ffmpeg -framerate 15 -pattern_type glob -i '*.png' -c:v libx264 -pix_fmt yuv420p training_as_video.mp4 ```
Another possible direction: @kevinzakka has a colab notebook here for getting saliency maps out of CLIP from a specific text-image prompt. https://github.com/kevinzakka/clip_playground/blob/main/CLIP_GradCAM_Visualization.ipynb 
@mesw As far as I can tell it is inherited from the biases present in CLIP and perhaps in the specific captions used to train this repo. @mehdidc oh yes...
@mehdidc @zeke The distinguishing information is: modelType: ["mlp_mixer", "vitgan"] -> basically "experimental (mlp_mixer) versus established (vitgan)" version: ["v0.1", "v0.2"] -> not sure what the precise differences are here, @mehdidc ?...
This project requires a CUDA enabled (Nvidia) GPU to run. I don't believe any Apple products will work as they only support AMD GPU's.
This issue can be closed now. @lucidrains
Does that mean something like ```IPython %tensorflow_version 1.x %pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html %pip install --upgrade CLIP %pip install deep-daze --upgrade ``` would be required in the colab...
@lucidrains I'm quite curious about this as well. Have been staring at this block of code for awhile now and haven't been able to understand it.