Bagheera

Results 446 comments of Bagheera

something still not quite right for mps, maybe the lack of 8bit optimisers really hurts more than i'd think, haha. we see sampling speed improvements up to bsz=8 and then...

![image](https://github.com/huggingface/diffusers/assets/59658056/b978e62e-8065-47ce-af0b-dac2f412542d)

tested the above training implementation on (so far) 300 steps of `ptx0/photo-concept-bucket` at a decent learning rate and batch size of 4 on an apple m3 max it's definitely learning....

unfortunately i hit a NaN at the 628th step of training, approximately the same place as before

looks like it could be https://github.com/pytorch/pytorch/issues/118115 as both optimizers in use that fail in this way do use addcdiv

it crashed after 628 steps and then on resume, it crashed after 300 steps, on the 901st. it also seems to get a lot slower than it should sometimes -...

@sayakpaul you know what it ended up being is a cached latent with NaN values. i ran the SDXL VAE in fp16 mode since i was using pytorch 2.2 a...

using the madebyollins sdxl vae fp16 model it occasionally NaNs, but not often enough to find the issue right away

on a new platform, the workarounds that are required for all platforms might not be added yet. eg. cuda handles type casting automatically, but mps requires strict types - any...

fp16 inference is thrown out long ago * sdxl's vae doesn't work with it * sd 2.1's unet doesn't work with it