Patrice Lopez

Results 77 issues of Patrice Lopez

As we are moving to more heterogeneous sources, crossref is one bibliographical record among others. To keep everything well separated and avoid destructive merging, the headache of unified representations and...

enhancement

We use snappy right now for LMDB stored records. There are other compression methods that might be relevant to small objects to get higher compression ratio and faster decompression (in...

enhancement

Matching of full raw reference string provides the best accuracy but is also the most expensive, so scaling with this kind of queries supposes to add many elasticsearch nodes. It...

enhancement

For the glutton web extension, it would be good to have a service that provides both the OA PDF access (as the current service/oa?) and the ISTEX ID when available.

enhancement

Keeping in mind that caching queries/results in LMDB make sense for matching queries only.

enhancement

Some abstract are present in crossref metadata, but of course many are available via MEDLINE data with the nice MeSH classes. The sub-package `pubmed-glutton` parse all MEDLINE data and map...

enhancement

Chicago reference style has this awful usage of 3em dash to repeat one or several, or all, authors of the previous reference. Although this practice seems to be removed or...

Unstable and work in progress! (follow-up of the `fix-vector-graphics` branch) This is a working version for a revision of the cascade process in Grobid, which changes the overall approach for...

The average time spent by FastMatcher (around 15% of the whole runtime) is particularly important for the journal names (11.3%). There are apparently too many short abbreviated concurrent journal names...

enhancement

From extracted affiliation by Grobid, it would be interesting to try to validate/correct affiliation strings from the affiliation information possibly present in CrossRef records. In addition, when ROR are available...

enhancement