Zach Mueller comments

Results 368 comments of


                                            Zach Mueller

Can accelerate train a single model on multiple TPU VMs (not a TPU Pod)?

In that case we won't until they do :)

fix the issue that ipex is always imported if it's installed in python env.

Also failing CLI tests are fine, they're related to a CI issue solved yesterday

Can accelerate train a single model on multiple TPU VMs (not a TPU Pod)?

I'd recommend opening an issue on the xla repo

A BUG? when performing gradient accumulation

@jcyk correct me if I'm wrong here but is there not a bug in the code, and `model.eval()` and/or `torch.no_grad` needs to be done so gradients don't get calculated on...

A BUG? when performing gradient accumulation

Otherwise I think a decent solution may be something like: ```python with accelerator.disable_gradient_accumulation(): ... ``` Which should only be used in tandem in situations like this specifically. Once entered it...

A BUG? when performing gradient accumulation

This is also similar to https://github.com/huggingface/accelerate/issues/960 I believe, so the proposed solution can be done if we agree on it @sgugger and @jcyk :)

A BUG? when performing gradient accumulation

@jcyk exactly, hence the other solution noted there about `pause` and `resume`, which discusses your exact issue. I.e.: ```python dl1, dl2 = accelerator.prepare(dl1, dl2) for i,batch in enumerate(dl1): ... if...

A BUG? when performing gradient accumulation

Hi @jcyk, apologies for the wait on this. Could you try again by installing accelerate with `pip install git+https://github.com/huggingface/accelerate@dataloader-multistate`? You shouldn't need the if/else there, just run the code as...

A BUG? when performing gradient accumulation

@yuvalkirstain can you provide a reproducer for me? When I tested this earlier it worked OOTB now with Gradient accumulation automatically. And make sure you are running version 0.18.0dev, or...

A BUG? when performing gradient accumulation

@Ethan-yt can you provide a reproducer for me to test with? Thanks!