Patrice Lopez

Results 390 comments of Patrice Lopez

@borkdude thank you for reporting these errors - I think the best would be to have a more robust dehyphenization process (it works not that bad normally...). The problem with...

1. yes that's why we apply in this case another dehyphenization method (dehyphenizeHard()) which is not excepting a break line. It explains why `mor- tality` is correctly dehyphenized as `mortality`...

OK I see, you were talking about the abstract for the clusteror. As I said the old-fashioned Header model is not using LayoutToken for decoding the CRF results, it follows...

There are differences, in particular I see a loss in citation metadatas and improvement on abstract. However the only way to be sure is to run it on the same...

Thank you @elonzh ! I need to add this in the customized schema. This is problem of the standard TEI: it can't go under `` because the DOI would refer...

The schemas have been updated with f1265a140d7c7eb231e5d1b6280480f2e6a271a7 and everything looks good: ``` lopez@work:~/grobid$ xmllint --noout --schema grobid-home/schemas/xsd/Grobid.xsd ~/Downloads/PhysRevC.100.014306\(1\).pdf.tei.xml /home/lopez/Downloads/PhysRevC.100.014306.tei.xml validates ```

Just tested... PR #701 does not fix it unfortunately, same error. But this is the opportunity to check both this PR and fix the bug :)

(removing comment, it was more for #811 !) https://github.com/kermitt2/grobid/issues/811#issuecomment-898320941 but it's relevant to the fact that we don't remove references, just keep track of the positions. The text at this...

Hello @jacksongoode ! Consolidation of headers is enabled by default (this is helpful for accuracy/quality). I think `0.7.0` by default is using https://cloud.science-miner.com/glutton which was more reliable than CrossRef web...

Hi @andrei-volkau Thanks for the issue ! Just to be sure, how do you highlight the PDF in your example? Are you using the page dimensions provided by Grobid or...