Bagheera comments

Results 447 comments of


                                            Bagheera

Support FLUX series models

those are pretty minimal and eg. it doesn't implement cosmap/logit-norm or any of the SD3 training details, just about the same as cloneofsimo/minRF implementation. in fact it's basically identical -...

Support FLUX series models

you don't need an A100 for flux. imo kohya should release sooner than keep trying to add the million features. you can train on 16G VRAM without any quantisation at...

Support FLUX series models

it's not like that at all though. fp8 is fine, especially in pytorch 2.4. you can read back through the comments in this issue to see.

Support FLUX series models

also, NF4 is definitely not "on par with Pro" 🤪

Fixes for Mac M3

mps has correctness issues and can't be relied on for training a model. however MLX or Tinygrad do not rely on MPS and have proper results. i've never seen good...

Fixes for Mac M3

the problem is probably an overflow inside pytorch's MPS code that has yet to be discovered. if you go to the pytorch issue tracker and search for `label:mps is:open` you...

Fixes for Mac M3

actually the MLX project created an example trainer and it outperforms pretty much anything we can currently do in pytorch. i think that this can be closed in preference of...

csv backend updates

i updated it to use file extensions instead of str_pattern to search. it still works. @williamzhuk can you follow-up with the other changes?

Add support for --checkpointing_rolling_steps and --checkpointing_use_tempdir

everything except atomic rename should work. in fact, we had atomic rename in the past. but it can also happen such that the instance is stopped **in the middle of...

Add support for --checkpointing_rolling_steps and --checkpointing_use_tempdir

probably --delete_bad_checkpoints would be better? it would account for filesystems where rename isn't atomic-on-crash (basically everywhere): > This trick doesn't work. People seem to think that this is safe becaus...