grobid icon indicating copy to clipboard operation
grobid copied to clipboard

Add training data for one special error case

Open lfoppiano opened this issue 2 years ago • 5 comments

This PR provides the training data to improve the recognition related to error case #632. I've added in the guidelines a couple of additional sentences related to the author contribution as discussed.

Where I was not sure, I commented directly in the XML files.

lfoppiano avatar Aug 12 '21 01:08 lfoppiano

Hello Luca ! Would it be possible to send me the PDF for these training cases?

kermitt2 avatar Dec 19 '21 15:12 kermitt2

This is just one document. The PDF was linked in the issue: https://www.nature.com/articles/s41598-020-58065-9.pdf

lfoppiano avatar Apr 18 '22 06:04 lfoppiano

Raw header file need to be regenerated after retraining of the segmentation model, because some content has been added in the segmentation file.

I think it would be better here to roll-back the changes in the guidelines because they will conflict will most recent ones done in other place and they are redundant.

kermitt2 avatar Aug 09 '22 23:08 kermitt2

It might be appropriate to merge this PR with branch feature/data-availability-statement first and not with master, because the changes relatively to training in different branches here make things difficult to follow and update in a consistent manner (the raw files and model in particular).

kermitt2 avatar Aug 09 '22 23:08 kermitt2

Thanks for the review! I close this and add these files (without guidelines) directly in the feature/data-availability-statement.

lfoppiano avatar Aug 12 '22 06:08 lfoppiano

closing because it is moved into feature/data-availability-statement

kermitt2 avatar Sep 25 '22 15:09 kermitt2