Zach Mueller comments

Results 471 comments of


                                            Zach Mueller

Accelerate + DeepSpeed

You can check with the docs here: https://huggingface.co/docs/peft/accelerate/deepspeed#compatibility-with-bitsandbytes-quantization--lora

Unable to launch DeepSpeed multinode training with a heterogenous mix of # devices per node.

Correct. We don’t support that currently. At least on the torch side, I believe they require the same number of machines per node. Does DeepSpeed not?

Incorrect output when using accelerate in a pytorch Unet model

Are you making sure to gather the results at the end (or look on the last process only)? otherwise you'll have intermittent results on each GPU. Please see the chunk...

Fix: adding import nececeary for using attention layers as bottlenecks

Thanks!

[Docs] Update low-precision training docs for MS-AMP

Yeah last I checked I had wanted to deprecate it/remove MS-AMP since it's no longer maintained. if you want to get on that before I'm back @SunMarc feel free :D...

about dataloader through prepare()

Yes generally that’s what we recommend doing, and then during validation we drop the extra samples during `gather_for_metrics` for an accurate calculation

Recent changes is causing "found at least two devices"

Looks like AWQ is another model that can't be fast-loaded. Will put in a fix

Recent changes is causing "found at least two devices"

Potentially. I'm not too familiar with the AWQ codebase. The PR that likely broke this is here: https://github.com/huggingface/transformers/pull/31771 In the model definition we need to set `_supports_param_buffer_assignment = False`, which...

Add NUMA affinity control for NVIDIA GPUs

@stas00 I think this should fully enable you now. You can either set `enable_cpu_affinity` to `True` in your `config.yaml`, or set the env variable `ACCELERATE_CPU_AFFINITY=1`

Add NUMA affinity control for NVIDIA GPUs

Despite my best efforts couldn't get away from using pynvml 😭 However it seems to come auto-installed (probably with PyTorch?) so did a simple import check. As we get more...