Bagheera

Results 447 comments of Bagheera

try `grep oom /proc/vmstat` which will show you how many oomkiller events have occurred without dmesg access

added a note about this to the flux quickstart

i'm not sure i really have the bandwidth to look into this one, someone else with the equipment and ability to reproduce the issue might have to take a look,...

this was confirmed to be working on the main branch now. the train.sh script is updated to locate and rely on the nvidia libraries in the venv.

actually, using `aot_eager` gets autograd involved and then dtype complaints happen. the gradients need to be in fp32 precision ... for a low bit optim? 🤔

yeah simpletuner supports finetuning diffusion models via torch-mps w/ or w/o optimum-quanto up to the 12B parameter Flux model, which really takes advantage of quantisation, down from 30G at pure...

either way not seeing memory savings with the 8bit adamw as i need the gradients to be upcast to fp32. the 4bit optim uses some ops not implemented on MPS...