DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

fix num_kv_heads sharding in uneven autoTP for Falcon-40b

Open Yejing-Lai opened this issue 2 years ago • 1 comments

Falcon-40b will fail on uneven autotp. Need to add 'num_kv_heads' in the kv_head_names list.

Yejing-Lai avatar Nov 21 '23 10:11 Yejing-Lai

hi @RezaYazdaniAminabadi, this PR solve Falcon-40b autoTP with uneven sharding on i.e. 3 ranks. Can this PR be reviewed? Thanks!

delock avatar Dec 04 '23 09:12 delock

@tjruwase from the failure log it seems like a environment issue. Has this already been resolved? Thanks!

FAILED tests/deepspeed/test_deepspeed.py::TrainerIntegrationDeepSpeed::test_early_get_last_lr_zero2_fp16 - deepspeed.ops.op_builder.builder.CUDAMismatchException: >- DeepSpeed Op Builder: Installed CUDA version 11.6 does not match the version torch was compiled with 12.1, unable to compile cuda/cpp extensions without a matching cuda version.

delock avatar Jan 03 '24 03:01 delock

@delock, apologies for the delay on this. The team is gradually returning from the holidays. This will be resolved asap.

tjruwase avatar Jan 03 '24 15:01 tjruwase