Zach Mueller

Results 435 comments of Zach Mueller
trafficstars

Tbh though, the `pynvml` solution makes more sense, we can add it as a CLI option and just raise an err if it's not installed. Let me work on that...

It is not, looks like we'll need to do it the hard way without pynvml (and just run a series of bash things) given that.

No worries, while un-fun, I'm getting it working with some subprocess ;)

@stas00 if you want to try some bleeding edge stuff, just pushed some commits. Haven't fully tested it on a multi-gpu system yet, but at least the dry run of...

Let's start small with the nvidia version, then we can add the AMD and gaudi2 as follow ups. (Since we can only test the nvidia-smi version rn)

@stas00 please see https://github.com/huggingface/accelerate/pull/2535 :)

@pjspol can you try setting `dispatch_batches=False` in the accelerator potentially? (I can check with lucidrains too in case thats a bit behind his apis some). This is a known bug...

@thevasudevgupta the recommended solution of `dispatch_batches=False` is still a requirement due to changes with the torch dataloader that have led to these issues and requires significant rewrite for us to...

The same answer as above, don’t use batch dispatching.

@raghavanone @thomas-schillaci could you try building from main and seeing if that fixes the issue? I think https://github.com/huggingface/transformers/pull/24521 fixed this