grobid
grobid copied to clipboard
Incorrect sentences coordinates
Sentences sometimes have wrong coordinates.
Sample files used (PDF, TEI & training files) : 60806_R1.zip
Notes:
- borders are rendered by our application, based on the TEI elements
s[coords]values (which are usually correct) - GROBID segmentation model have been trained on these PDF (and the fulltext model "recognises the refs correctly")
Case sentence with element <ref> containing char ;
Incorrect coordinates
Exemple 1
PDF (coordinates rendering)
First bugged sentence

Group of bugged sentence

All sentences of this page

TEI (processing)

Note : the right part of the ref is no longer in this file (after the ; char)
TEI (training)
Note : the entire ref is in this file
Correct coordinates
Exemple 1
PDF (coordinates rendering)

TEI (processing)

TEI (training)

Exemple 2
PDF (coordinates rendering)

TEI (processing)

TEI (training)

Hi @NicolasKieffer !
Many thanks for the very clear and documented issue.
It is fixed in Grobid master, it was just one missing line in the "if" statement :/ The coordinates of the full sentence are correct too now.
<head n="2.1.1" coords="7,90.02,257.53,24.00,10.80;7,144.02,257.53,245.59,10.80">Static Model:
a single season occupancy analysis</head>
<p>
<s coords="7,90.02,285.13,391.18,10.80;7,72.02,312.73,464.39,10.80;7,72.02,340.33,467.45,10.80;
7,72.02,367.93,465.80,10.80;7,72.02,395.53,272.00,10.80">The MSDOM is a form of the
multi-state occupancy model with state uncertainty
<ref type="bibr" coords="7,72.02,312.73,118.87,10.80" target="#b59">(MacKenzie et al., 2009;</ref>
<ref type="bibr" coords="7,193.90,312.73,97.62,10.80">Nichols et al., 2007)</ref> and is defined below
with four states equivalent to the original co-occurrence model
The fix will be part of next release 0.7.1, which should come in the next days.