Zach Mueller
Zach Mueller
You can check with the docs here: https://huggingface.co/docs/peft/accelerate/deepspeed#compatibility-with-bitsandbytes-quantization--lora
Correct. We don’t support that currently. At least on the torch side, I believe they require the same number of machines per node. Does DeepSpeed not?
Are you making sure to gather the results at the end (or look on the last process only)? otherwise you'll have intermittent results on each GPU. Please see the chunk...
Yeah last I checked I had wanted to deprecate it/remove MS-AMP since it's no longer maintained. if you want to get on that before I'm back @SunMarc feel free :D...
Yes generally that’s what we recommend doing, and then during validation we drop the extra samples during `gather_for_metrics` for an accurate calculation
Looks like AWQ is another model that can't be fast-loaded. Will put in a fix
Potentially. I'm not too familiar with the AWQ codebase. The PR that likely broke this is here: https://github.com/huggingface/transformers/pull/31771 In the model definition we need to set `_supports_param_buffer_assignment = False`, which...
@stas00 I think this should fully enable you now. You can either set `enable_cpu_affinity` to `True` in your `config.yaml`, or set the env variable `ACCELERATE_CPU_AFFINITY=1`
Despite my best efforts couldn't get away from using pynvml 😠However it seems to come auto-installed (probably with PyTorch?) so did a simple import check. As we get more...