Patrice Lopez

Results 601 comments of Patrice Lopez

Hello @frankang ! Currently the full text model (which parses the article body) simply identifies the global formula areas, and optionally a reference label for the formula. There's no parsing...

Hello ! @NorbertSandor Windows is not supported anymore by Grobid, the binaries for pdfalto are missing. @cerulij it's likely that the binaries for pdfalto shipped with Grobid are not compatible...

@NorbertSandor just pointing to the documentation about the non-support of Windows https://grobid.readthedocs.io/en/latest/Troubleshooting/#windows-related-issues Using docker is a convenient way to have a Grobid service working on windows, see https://grobid.readthedocs.io/en/latest/Grobid-docker/

I am not sure it is a good idea: - we should reasonably expect logging when using the docker image, because it's typically a production usage - I think it's...

Hello, arXiv identifiers are well supported by the bibliographical reference models (nearly 2000 annotated examples haven been added with arXiv ids), however the header model has not been yet updated...

hello @philgooch ! DOI detection works exactly like this in the header since something like 4-5 years. This is working very well for very discriminant identifiers indeed. The advantage of...

arXiv ids are now well supported both for citations (since quite a long time) and in the metadata header. In the original examples PDF of this issue: 1801.00857.pdf -> ```xml...

Hello @mfeblowitz ! I guess in this case you are producing a PDF from the HTML page, correct? With which tool are you generating this PDF? Apparently with Firefox/Linux, the...

Do you have an example of such PDF? Where does it come form? because this article seems originally in html first. The problem applies similarly to native PDF using embedded...

mmm check if "fi" "ff" occurs in the text of not? At least it would cover the ligature case, but the embedded font issue can happen for many characters in...