flair
flair copied to clipboard
Why do sometimes stacked embedding methods underperform ?
I observe that stacking one embedding with the other sometimes leads to lower accuracy than the individual. I am working on low-resource Indian languages. When I use stacked embedding - Glove, BPE, BERT, XLM-R, MuRIL with IndicBERT, I observe lower POS accuracy scores than IndicBERT on its own. In most cases, the accuracy improves. I wonder why does it happen?