Results 45 comments of neverix

That is not Stable Diffusion, it's an older model that has been available since April at https://github.com/CompVis/latent-diffusion

Yes this specific checkpoint is causing a lot of confusion

Have you gotten it to work?

Nice, this will be useful for porting audio and pose models

Right now the code can't just do forward over all tokens because of the caching implementation. It needs to run through every token instead of just masking the attention

#80 solves this

Look at [the scripts](https://github.com/CompVis/taming-transformers/blob/master/scripts/reconstruction_usage.ipynb), they're pretty helpful

I think I finally figured it out. 1) `!pip install mesh-transformer-jax/ jax==0.2.12 tensorflow==2.5.0 chex==0.0.6 jaxlib==0.3.7` 2) ``` #@title Patch 1 %%file /usr/local/lib/python3.7/dist-packages/chex/_src/pytypes.py # Lint as: python3 # Copyright 2020 DeepMind...

Ideally there would be a converter + ignoring mismatching inputs/outputs in case the model has a different amount of channels

I made a similar fix, can confirm that this works