Bagheera

Results 446 comments of Bagheera

EulerDiscreteScheduler is used by **internal** training done by HuggingFace, for example, the SDXL ControlNet models were trained with `--use_euler`, but this means you can't train v-prediction models with that.. because...

that's not the point i was making, they use it internally on _epsilon_ prediction models. it's just that training on Euler isn't unordinary and it's even done inside HF itself....

can you check whether the decay works correctly for you in a small test loop? for me, it does not. and i have to do: ```py ema_unet.optimization_step = global_step ```...

i understand that the other script does this, but that change is from May, 2023. if anything we should fix the method used by this script, and enhance the other...

if we can rely on the bf16 fixed AdamW optimiser, we can save storage space for the fp32 weights. overall, training becomes more efficient and reliable. thoughts?

> ``` > unet = UNet2DConditionModel.from_pretrained( > args.pretrained_model_name_or_path, subfolder="unet", revision=args.revision > ).to(weight_dtype) > ``` > > Would it apply to float16 too? i can try, but since the sd 2.1...

the other crappy thing is, without autocast, we have to use the same dtype for vae and unet. this is probably fine because the SD 1.x/2.x VAE handles fp16 like...

older CUDA devices emulate bf16, eg. the T4 on Colab. Apple MPS supports it with Pytorch 2.3 AMD ROCm i think is the outlier, but it also seems to have...

``` RuntimeError: Function 'MseLossBackward0' returned nan values in its 0th output. ``` can't do fp16 on SD 2.1, as it goes into NaN at the 0th output.