DeepSpeed
DeepSpeed copied to clipboard
fix num_kv_heads sharding in uneven autoTP for Falcon-40b
Falcon-40b will fail on uneven autotp. Need to add 'num_kv_heads' in the kv_head_names list.
hi @RezaYazdaniAminabadi, this PR solve Falcon-40b autoTP with uneven sharding on i.e. 3 ranks. Can this PR be reviewed? Thanks!
@tjruwase from the failure log it seems like a environment issue. Has this already been resolved? Thanks!
FAILED tests/deepspeed/test_deepspeed.py::TrainerIntegrationDeepSpeed::test_early_get_last_lr_zero2_fp16 - deepspeed.ops.op_builder.builder.CUDAMismatchException: >- DeepSpeed Op Builder: Installed CUDA version 11.6 does not match the version torch was compiled with 12.1, unable to compile cuda/cpp extensions without a matching cuda version.
@delock, apologies for the delay on this. The team is gradually returning from the holidays. This will be resolved asap.