Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

How to filter only relevant lables in LayoutLMv2

Open Abhishekvats1997 opened this issue 3 years ago • 2 comments
trafficstars

Hi, first of all thanks a lot for your work. I followed along your tutorial on data prep from CORD and fine-tuned on my custom dateset which is pretty similar. I only had annotation to the required labels and no annotation for the "Other" tokens. Now when i predict using my trained model each and every piece of text is predicted some or the other label, when it actually should have been just the predictions for my 5 labels. Is there any way to filter predictions by probability or something ? or do i need to have the other non-relevant tokens annotated as "other" and retrain, if so can i somehow automate the synthesis of this "other" label annotations. And ofcourse this is happening when I am doing true inference.

Abhishekvats1997 avatar May 30 '22 00:05 Abhishekvats1997

Hi. There is no Other label for coord dataset as i see

Other appears in iob_to_label function

def iob_to_label(label):
    label = label[2:]
    if not label:
      return 'other'
    return label

Upd. Other is label for tokens which wasn't predicted by model. You can just get rid of them by modifiyng some logic in function above(or the logic where that function appears) Or train your model with more samples or more detailed labeling. Anyway there will be other labels in prediction(or non-predicted) while there are probability of failed prediction

fraps12 avatar Jul 07 '22 07:07 fraps12

Hi @Abhishekvats1997 Did you solved the issue ?

sheikhasim avatar Jul 27 '22 15:07 sheikhasim