flax icon indicating copy to clipboard operation
flax copied to clipboard

fix: restore replicated tensors

Open borisdayma opened this issue 2 years ago • 3 comments

What does this PR do?

Fixes restoring checkpoints where only some of the parameters are fully replicated.

I was not able to create a test (requires sharding over multiple devices and no similar example) but I created a simple reproduction here.

Checklist

  • [x] This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] This change is discussed in a Github issue/ discussion (please add a link).
  • [x] The documentation and docstrings adhere to the documentation guidelines.
  • [ ] This change includes necessary high-coverage tests. (No quality testing = no merge!)

borisdayma avatar Jul 20 '23 21:07 borisdayma

Codecov Report

Merging #3217 (535ed6f) into main (c8bb930) will not change coverage. The diff coverage is 0.00%.

@@           Coverage Diff           @@
##             main    #3217   +/-   ##
=======================================
  Coverage   82.32%   82.32%           
=======================================
  Files          54       54           
  Lines        6071     6071           
=======================================
  Hits         4998     4998           
  Misses       1073     1073           
Impacted Files Coverage Δ
flax/training/orbax_utils.py 69.44% <0.00%> (ø)

codecov-commenter avatar Jul 20 '23 22:07 codecov-commenter

https://github.com/google/flax/pull/3229 includes your change and fixed another issue that appeared with your change. Feel free to give it a try!

IvyZX avatar Jul 26 '23 23:07 IvyZX

Thanks, I tested the main branch and my bug example works no

borisdayma avatar Jul 27 '23 19:07 borisdayma