Patrice Lopez

Results 77 issues of Patrice Lopez

In the following example https://arxiv.org/pdf/2103.12028v1.pdf there are cases of wrong sentence segmentations, with sentence offsets apparently shifted by a few characters, resulting in word cut. This happens whatever the selected...

bug

See #688 As dynamically loading native libraries is not possible with JDK > 10 (or with complicated hacks depending on the JVM version), this PR tries to set the `java.library.path`...

Dropwizard and Prometheus use different convention for naming metrics. We use Dropwizard as REST API framework and Prometheus Java Suite collects the Dropwizard metrics for producing http://localhost:8071/metrics/prometheus but it is...

enhancement

I am opening a specific issue for this, although it was mentioned #603. The JEP instance will load itself the native JEP library, but for this to happen we need...

enhancement
need help

Regarding header/metadata, the following PLOS article is correctly processed with processFulltextDocument service (correct DOI, journal, etc.) However in case of processHeaderDocument, the wrong DOI is selected (the one for the...

error cases

Jitpack now fails: > Git error. Max repo size 500MB exceeded This is painful, because the artefacts themselves are just 15MB for the jar and 76MB for the one-jar. The...

Thanks to @Aazhar here is a document where accent compositions still fail. See author `Mélanie` `Me 'lanie` or `Université`/`Universite ',` This is a problem with pdfalto apparently, but keeping track...

bug

There's a relatively high number of PDF with hidden content (usually white on white), which impacts more or less severely the grobid processing. Two cases I want to make visible/highlight...

enhancement
error cases

This PR setup DeLFT to use tensorboard: logs, callbacks... The idea is to cover the interesting metrics so that everything is visualized on Tensorboard. How to use: basically nothing special,...

For transformer input, we started with BERT so by considering WordPiece. For sequence labeling, we have typically pre-segmented sentences `(w_0, ..., w_n)` with expected label `(l_0,...,l_n)` and optionally some aligned...