sallyjunjun
sallyjunjun
When I test model with full recomputation, the forward all-gather communication is not overlapped. Because is_grad_enabled is false when forward with full recomputation. I see the following code in _LayerNormLinear...
When I run run_open_llama_w_vescale.py with torch version 2.5.1+cu124, I met the following error: [rank4]: Traceback (most recent call last): [rank4]: File "/code/veScale/examples/open_llama_4D_benchmark/run_open_llama_w_vescale-ljx.py", line 104, in [rank4]: vescale_model = parallelize_module(model, device_mesh["TP"],...
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...