Save O_PROJ on Fuji 70B-v2 for TRN2
Saving out-projection improves training throughput while still fitting in the mesh defined by neuron-(trn2|trn2n).48xlarge-64.
Rebased the PR to fix failing CI
I see the CI fails for test TestEvaluateFromFile.test_evaluate_from_eval_set with error
#22 453.1 axlearn/open_api/common.py:440: KeyError
This is unrelated to the changes in this PR, I already rebased the PR to 12th May. Can I please get some guidance on how I can fix this? Thank you!
I see the CI fails for test
TestEvaluateFromFile.test_evaluate_from_eval_setwith error#22 453.1 axlearn/open_api/common.py:440: KeyErrorThis is unrelated to the changes in this PR, I already rebased the PR to 12th May. Can I please get some guidance on how I can fix this? Thank you!
I'm disabling this test here: https://github.com/apple/axlearn/pull/1184 cc @gyin94
Rebased to disable flaky test
This pull request has been automatically marked as stale because it has been inactive for 60 days. It will be closed in 7 days if no further activity occurs. If you would like to continue working on this, please remove the stale label or leave a comment.