grobid icon indicating copy to clipboard operation
grobid copied to clipboard

Possibility of only annotating needed parts

Open m485liuw opened this issue 2 years ago • 2 comments

Hi, I'm wondering if there's a way to change GROBID to only give labels to the parts we specify (i.e. abstract, title) and change the ground truth of the training data to only include those specified labels? Do you think this is doable and will that negatively affect the training result?

m485liuw avatar Jul 31 '21 21:07 m485liuw

Hello @m485liuw !

My experience so far is that it would impact negatively accuracy of the remaining labels:

https://github.com/kermitt2/grobid/issues/777#issuecomment-870170270

I actually introduced in Grobid extra labels only for improving the core ones, and did the same in other sequence labelling projects.

What is your motivation for doing this?

kermitt2 avatar Jul 31 '21 22:07 kermitt2

Hi, Thanks for the reply. The motivation is we only care about improving some of the labels, thus don't wanna waste time annotating the others. But ya, you said you introduced the other labels also just for improving the core ones. So I guess this is the best way.

m485liuw avatar Aug 02 '21 22:08 m485liuw