PatCit icon indicating copy to clipboard operation
PatCit copied to clipboard

"Pages" in `title_j`

Open cverluise opened this issue 4 years ago • 0 comments

Around 0.8% of the NPL publication in the beta dataset have "Pages" as title_j.

How to reproduce the behaviour

SELECT
  *
FROM (
  SELECT
    *
  FROM
    `npl-parsing.patcit.beta`
  WHERE
    title_j ="Pages"
    ) 
    AS parsing
JOIN (
  SELECT
    npl_publn_id AS id,
    npl_biblio
  FROM
    `usptobias.patstat.tls214`) AS tls214
ON
  tls214.id=parsing.npl_publn_id

Ideas Solution

The issue seems to be closely related to the one described in #14

There seems to be a common pattern in these citations in the sense that they are already well structured (e.g ENTNEHEMEN UND PRUEFEN MIT EINEM SCHNELLEN HANDHABUNGSGERAET', KUNSTSTOFFE,DE,CARL HANSER VERLAG. MUNCHEN, vol. 80, no. 8, 1 August 1990 (1990-08-01), pages 894, XP000150775, ISSN: 0023-5563).

As for #14 , training the Grobid model on these examples seems to be the best option. Then, examples affected by this issue will be processed again.

cverluise avatar Nov 09 '19 12:11 cverluise