Birch-san

Results 175 comments of Birch-san

I have a repro (from stable-diffusion's attention forward-pass). https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a0f6ee1af8bb9a5d0016ccb27e36dc/ldm/modules/attention.py#L180 ```python from torch import einsum, ones # crashes with "product of dimension sizes > 2**31" # this is equivalent to invoking...

Can you give some examples? Because I've done stable diffusion inference and TI training just fine on Python 3.10, with mainline master branch, and have done inference just fine with...

@patrickvonplaten rather than having everybody save & re-upload their weights: can diffusers intercept the weights during model load and map them to different parameter names? Apple uses PyTorch's `_register_load_state_dict_pre_hook()` idiom...

https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/b34b25b4c941819d34f29be6c4c1ec01e64585b4#commitcomment-86212295

I found that this mitigation works, but I assume it only works if you're consistent in the size of `x` and `context` that you submit to `CrossAttention` (e.g. same image...

personally I run pytorch stable 1.12.1 (because it's faster than 1.13RC or the nightlies https://github.com/pytorch/pytorch/issues/85297, ~https://github.com/pytorch/pytorch/issues/87010~), so I don't encounter the einsum reproducibility problem. my use-case is almost always "launch...

wasn't this problem due to this einsum bug: https://github.com/pytorch/pytorch/issues/85224 solved since at least `1.13.0.dev20220928` (so should be in latest stable, 1.13.1). in any case: diffusers CrossAttention doesn't use einsum any...

hard disagree that a CoreML model is a substitute to having a working PyTorch MPS model. but I do think [diffusers is deterministic on MPS](https://github.com/huggingface/diffusers/issues/372#issuecomment-1374846894) anyway.

> Since those are the same I think you need to set eta to something other than the default of 0, because it isn't using the random noise at all....

oh wow, halving `rtol` to 0.025 **does** help `sample_dpm_adaptive` produce big sleeves similar to the ones `sample_dpm_adaptive` converged on. target we're trying to converge on (`sample_dpmpp_2s_ancestral`, 100 steps): `sample_dpm_adaptive` eta=0.75...