Emily
Results
13
comments of
Emily
Wonderful! I tested this (cherry picked on top of 0.4.26) with my ring attention algorithm and it looks like it works. :)
Oh, looks like jax has been updated and reorganized their array types *again*. I can fix it.
To clarify, the `max(0, ·)` trick is only applied to the bottleneck loss that goes into training the discriminator. The update step for `beta` should use the unadulterated `th.mean(kl_divergence) -...