Sam Stoelinga

Results 223 comments of Sam Stoelinga

Chatted with @Ethanlm the issue with the current PR is that it doesn't allow users to modify XLA flags in axlearn code. Please hold off with merging for now. I...

@Ethanlm can you please review the PR again now that I've added the ability to add and override any of the existing XLA flags?

@markblee could I get a final review from you so we can get it merged?

@markblee could you review once more please? I've addressed all your comments.

@markblee could you please review once more? I moved the get_xla_options and get_megascale_options to pathways_util.py. Nothing else changed. I also re-ran my manual test.

Good catch! Apologies for somehow messing up the deletion in compiler_options.py. I reviewed the final changes and also moved the parse_xla_flag function to pathways_utils.py. Please review again once more.

Seems the error is unrelated to my PR: https://github.com/apple/axlearn/actions/runs/15166874891/job/42647986834?pr=1163#step:8:5949 ``` #22 155.3 /opt/venv/lib/python3.10/site-packages/transformers/integrations/tensor_parallel.py:465: in __init__ #22 155.3 self.input_layouts = (input_layouts or Replicate(),) #22 155.3 E NameError: name 'Replicate' is not...

Merged with latest main in the hopes that it would fix the CI issue. Could you re-trigger? Specifically #1205 may fix the CI failure in my run.

@markblee checks are now passing after merging latest main. Are we good to get this merged?

Failure seems unrelated to my PR: ``` The hosted runner: GitHub Actions 314 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for...