DeepSpeedExamples 'gamma', 'theta' not found in progressive layer drop

'gamma', 'theta' not found in progressive layer drop

Open marchen00 opened this issue 2 years ago • 3 comments

Hi! Thank you guys for the tool and the example. I've been trying to reproduce 'progressive layer dropping' on Roberta and other pretrain methods, but I didn't found where gamma and theta which stated in deepspeed_bsz4k_progressive_layer_drop_config_seq128.json are used in the project.

For 'theta', I found code in nvidia/modelingpreln_layerdrop.py line 1160 theta = kwargs.get('pld_theta', 1.0), but it's 'pld_theta' rather than 'theta'.

For 'gamma', which should be used for drop rate scheduling, I didn't find anywhere using it.

Please kindly inform me if I miss anything, thank you very much.

Mar 14 '22 08:03 marchen00

DeepSpeedExamples DeepSpeedExamples copied to clipboard

'gamma', 'theta' not found in progressive layer drop

DeepSpeedExamples
DeepSpeedExamples copied to clipboard