DeepSpeed
DeepSpeed copied to clipboard
DeepSpeedCheckpoint: support custom final ln idx
till today only last layer (idx=-1) was considered using FINAL_LAYER_NORM_INDEX which is set to -1. this PR allows the user to pass custom value for model where this default value does not apply. see example for usage in HabanaAI/Megatron-DeepSpeed fork repository: https://github.com/HabanaAI/Megatron-DeepSpeed/blob/c9feb8cacabc6dd4da4266cff08db555a21122e2/tools/verify_checkpoint_non_tp_consistency.py#L296