DeepSpeedExamples
DeepSpeedExamples copied to clipboard
'gamma', 'theta' not found in progressive layer drop
Hi! Thank you guys for the tool and the example.
I've been trying to reproduce 'progressive layer dropping' on Roberta and other pretrain methods, but I didn't found where gamma
and theta
which stated in deepspeed_bsz4k_progressive_layer_drop_config_seq128.json
are used in the project.
For 'theta', I found code in nvidia/modelingpreln_layerdrop.py
line 1160 theta = kwargs.get('pld_theta', 1.0)
, but it's 'pld_theta' rather than 'theta'.
For 'gamma', which should be used for drop rate scheduling, I didn't find anywhere using it.
Please kindly inform me if I miss anything, thank you very much.