DFGN-pytorch
DFGN-pytorch copied to clipboard
Inquiry about ‘nan’ in pickle file
Thanks for your excellent work. Some 'nan' occur in pkl file downloading from your google drive, but the final result is as good as you've declared. Accutually, I run some demos and extract bert embedding and find the absolute value in each dimension are mostly under 2 without 'nan'. How can thoses 'nan' effect the whole network, will it trigger the gradient explosion?