unilm LayoutLM NaN Loss while Training

LayoutLM NaN Loss while Training

Open sathwikacharya opened this issue 3 years ago • 1 comments

Hey I am having this issue where the loss outputted by the model during training is nan. This usually happens after the 3rd epoch. I am training this on a custom dataset with 29 classes and 40000 data points. The steps followed are identical to this link except for a few tweaks : https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/RVL-CDIP/Fine_tuning_LayoutLMv2ForSequenceClassification_on_RVL_CDIP.ipynb The training is being done on AWS SageMaker notebook instance. The accelerate API (the notebook_launcher() function to be more precise) is also used to leverage training on multiple GPUs. Moreover the output of the logits for all test case predictions is also NaN.

Any help to this is much appreciated.

Thank you

Feb 16 '22 03:02 sathwikacharya

@sathwikacharya did you resolve this issue? I'm facing the same issue, especially switching from microsoft/layoutlmv3-base to microsoft/layoutlmv3-large.

Mar 27 '24 16:03 Rithsek99

unilm unilm copied to clipboard

LayoutLM NaN Loss while Training

unilm
unilm copied to clipboard