PatCit
PatCit copied to clipboard
"Pages" in `title_j`
Around 0.8% of the NPL publication in the beta dataset have "Pages" as title_j
.
How to reproduce the behaviour
SELECT
*
FROM (
SELECT
*
FROM
`npl-parsing.patcit.beta`
WHERE
title_j ="Pages"
)
AS parsing
JOIN (
SELECT
npl_publn_id AS id,
npl_biblio
FROM
`usptobias.patstat.tls214`) AS tls214
ON
tls214.id=parsing.npl_publn_id
Ideas Solution
The issue seems to be closely related to the one described in #14
There seems to be a common pattern in these citations in the sense that they are already well structured (e.g ENTNEHEMEN UND PRUEFEN MIT EINEM SCHNELLEN HANDHABUNGSGERAET', KUNSTSTOFFE,DE,CARL HANSER VERLAG. MUNCHEN, vol. 80, no. 8, 1 August 1990 (1990-08-01), pages 894, XP000150775, ISSN: 0023-5563).
As for #14 , training the Grobid model on these examples seems to be the best option. Then, examples affected by this issue will be processed again.