grobid icon indicating copy to clipboard operation
grobid copied to clipboard

Incorrect sentences coordinates

Open NicolasKieffer opened this issue 3 years ago • 1 comments

Sentences sometimes have wrong coordinates.

Sample files used (PDF, TEI & training files) : 60806_R1.zip

Notes:

  • borders are rendered by our application, based on the TEI elements s[coords] values (which are usually correct)
  • GROBID segmentation model have been trained on these PDF (and the fulltext model "recognises the refs correctly")

Case sentence with element <ref> containing char ;

Incorrect coordinates

Exemple 1

PDF (coordinates rendering)

First bugged sentence image

Group of bugged sentence image

All sentences of this page image

TEI (processing)

image

Note : the right part of the ref is no longer in this file (after the ; char)

TEI (training)

image Note : the entire ref is in this file

Correct coordinates

Exemple 1

PDF (coordinates rendering)

image image

TEI (processing)

image

TEI (training)

image

Exemple 2

PDF (coordinates rendering)

image image

TEI (processing)

image

TEI (training)

image

NicolasKieffer avatar Apr 15 '22 15:04 NicolasKieffer

Hi @NicolasKieffer !

Many thanks for the very clear and documented issue.

It is fixed in Grobid master, it was just one missing line in the "if" statement :/ The coordinates of the full sentence are correct too now.

<head n="2.1.1" coords="7,90.02,257.53,24.00,10.80;7,144.02,257.53,245.59,10.80">Static Model: 
a single season occupancy analysis</head>
<p>
     <s coords="7,90.02,285.13,391.18,10.80;7,72.02,312.73,464.39,10.80;7,72.02,340.33,467.45,10.80;
                        7,72.02,367.93,465.80,10.80;7,72.02,395.53,272.00,10.80">The MSDOM is a form of the 
     multi-state occupancy model with state uncertainty 
     <ref type="bibr" coords="7,72.02,312.73,118.87,10.80" target="#b59">(MacKenzie et al., 2009;</ref> 
     <ref type="bibr" coords="7,193.90,312.73,97.62,10.80">Nichols et al., 2007)</ref> and is defined below 
     with four states equivalent to the original co-occurrence model 

The fix will be part of next release 0.7.1, which should come in the next days.

kermitt2 avatar Apr 16 '22 15:04 kermitt2