vokenization A problem about ClassificationHead in the model.py

A problem about ClassificationHead in the model.py

Open Shimao-Zhang opened this issue 1 year ago • 0 comments

Thanks for your great work! And I notice that you utilized a non-linear layer with GELU and a LayerNorm operation and a linear layer called decoder as the voken classification head, which is different from the way mentioned in the paper. In the paper, it is a softmax layer following a linear layer. Did they perform similarly or just cuz I misunderstand it?

May 03 '23 12:05 Shimao-Zhang

vokenization vokenization copied to clipboard

A problem about ClassificationHead in the model.py

vokenization
vokenization copied to clipboard