DeepSpeedExamples
DeepSpeedExamples copied to clipboard
'gamma', 'theta' not found in progressive layer drop
Hi! Thank you guys for the tool and the example.
I've been trying to reproduce 'progressive layer dropping' on Roberta and other pretrain methods, but I didn't found where gamma and theta which stated in deepspeed_bsz4k_progressive_layer_drop_config_seq128.json are used in the project.
For 'theta', I found code in nvidia/modelingpreln_layerdrop.py line 1160 theta = kwargs.get('pld_theta', 1.0), but it's 'pld_theta' rather than 'theta'.
For 'gamma', which should be used for drop rate scheduling, I didn't find anywhere using it.
Please kindly inform me if I miss anything, thank you very much.
Hi! Thank you guys for the tool and the example. I've been trying to reproduce 'progressive layer dropping' on Roberta and other pretrain methods, but I didn't found where
gammaandthetawhich stated indeepspeed_bsz4k_progressive_layer_drop_config_seq128.jsonare used in the project.For 'theta', I found code in
nvidia/modelingpreln_layerdrop.pyline 1160theta = kwargs.get('pld_theta', 1.0), but it's 'pld_theta' rather than 'theta'.For 'gamma', which should be used for drop rate scheduling, I didn't find anywhere using it.
Please kindly inform me if I miss anything, thank you very much.
I also meet the problem.
When running Squad, I have the error "unexpect scope output.LayerNorm name in transformer layer." in load_hf_weights_in_bert_kernel.
Then I add the code
elif name_str.find("output.LayerNorm") > 0:
logger.info("Ignore Huggingface weight {} with shape {}".format(name_str, array.shape))
continue
NAN or INF would appearing during the training.
Have you solve the problem ?
@FatCockHu, can you please open a separate ticket for your error? Thanks!
@marchen00, the PLD implementation is split between the DeepSpeed engine and the client. In particular, DeepSpeed maintans the theta and gamma values here, and with this logic makes them available for client's forward usage, such as you highlighted. If you have not done so already, it might be helpful to go through the associated tutorial.
@minjiaz, FYI