xla icon indicating copy to clipboard operation
xla copied to clipboard

Gradient bucketing using a pre-defined bucket size cap

Open amithrm opened this issue 1 year ago • 2 comments

amithrm avatar Jan 30 '24 14:01 amithrm

Do you mind adding a test case?

alanwaketan avatar Jan 30 '24 19:01 alanwaketan

Added the test case and rebased @JackCaoG @alanwaketan

amithrm avatar Mar 04 '24 20:03 amithrm

@JackCaoG do you know why the build failed with "ERROR: Error initializing RemoteModule"?

jeffhataws avatar May 28 '24 20:05 jeffhataws

It is on a fork hence can't use remote cache but there was a bug that it still try to query the credintical. I think we fixed this issue error today, it should start building without cache. If you rebase the CI should start running.

JackCaoG avatar May 28 '24 20:05 JackCaoG

@JackCaoG looks like build is still failing for some reason after rebasing. Maybe another rebase is needed?

jeffhataws avatar May 31 '24 18:05 jeffhataws

The error still seems to be related with the fork. Let me grant both of you the write access, then you can open pr directly.

JackCaoG avatar May 31 '24 20:05 JackCaoG

OK I gave @amithrm write access

JackCaoG avatar May 31 '24 20:05 JackCaoG

Replaced by https://github.com/pytorch/xla/pull/7216 to avoid the build issues in CI testing.

jeffhataws avatar Jun 07 '24 16:06 jeffhataws