inkcherry
inkcherry
Add `MPICHRunner` class. This PR is allow user to run deepspeed with mpich launcher. We verified it works with Megatron-Deepspeed in multi-node training.
This is an experimental demo on autoTP training, not for review. Apologies for its somewhat rudimentary draft version, I hope to elucidate this process. Currently, I tested pure TP (DP=1...
save time and memory overhead in maintaining flattened buffers.
use dp_world_size for grad reduction, instead of seq_dp_world_size. Currently, for zero0, only sparse tensors use the correct world_size. tiny model with sp=4 grad norm test: grad_norm | step1 | step2...
When running the script scripts/pretrain.sh directly, a ModuleNotFoundError: No module named 'llava' may occur. Implement automatic configuration of the Python search path to include the necessary directory. related issue: https://github.com/haotian-liu/LLaVA/issues/1571
In sequence_parallel (Ulysses), the sequence parallel size is constrained by the requirement to be divisible by the number of heads, which prevents some models/workloads from setting a specific sequence parallel...
FYI , @hwchen2017
fix ci hang. improve the ut.
wip. pending on perf test