Yu Chin Fabian Lim

Results 24 comments of Yu Chin Fabian Lim

@samvanstroud so how will this work? lets say https://github.com/onnx/onnx/issues/4322 is resolved and ScatterElements min/max are supported. Then, we still require updates to `torch.onnx` to understand how to handle `torch_scatter` ?

@rusty1s Ic. so then a `max` reduction via `torch.scatter_reduce` operations can be supported by `ScatterElements` min/max as per https://github.com/onnx/onnx/issues/4322?

@Titus-von-Koeller @TimDettmers sorry to hijack this issue. Doing something related but not exactly the same. im trying to use FSDP with `bitsandbytes==0.42.0` to finetune `EleutherAI/pythia-1b` that has 8bit weights -...

> Noting that this issue, although stale, remains an issue. Although optimization can run, a functional state dict cannot be saved with 8bitadam. @152334H when you were trying this, did...

@Titus-von-Koeller @TimDettmers I think the problem still remains even with BNB 0.43. The reason is because BNB performs optimizer steps with CUDA. 1. when using CPU offload, the gradients are...

@Titus-von-Koeller On one hand, we can workaround this by loading all the quantities onto GPU, but this will be very inefficient. On the other hand, I feel the better approach...

> Hello @fabianlim , Thank you for the PR of Accelerate and this for reducing the memory usage with FSDP by forcing gradient synchronization at each step. An overall comment:...

> Thanks! We're getting closer. > Also, no need to force-push your commits, we squash the commit history (and force-push makes it harder for us to track things) Got it....

> Got it, makes sense and thank you for all the details. Then maybe more documentation on the Accelerate docs and Transformers docs would help when this flag makes a...

~> (You may also need to rebase from main for the test failures)~ ~@muellerzr should I rebase now or wait till the end until we resolve most of the changes?...