Patrice Lopez
Patrice Lopez
As we are moving to more heterogeneous sources, crossref is one bibliographical record among others. To keep everything well separated and avoid destructive merging, the headache of unified representations and...
We use snappy right now for LMDB stored records. There are other compression methods that might be relevant to small objects to get higher compression ratio and faster decompression (in...
Matching of full raw reference string provides the best accuracy but is also the most expensive, so scaling with this kind of queries supposes to add many elasticsearch nodes. It...
For the glutton web extension, it would be good to have a service that provides both the OA PDF access (as the current service/oa?) and the ISTEX ID when available.
Keeping in mind that caching queries/results in LMDB make sense for matching queries only.
Some abstract are present in crossref metadata, but of course many are available via MEDLINE data with the nice MeSH classes. The sub-package `pubmed-glutton` parse all MEDLINE data and map...
Chicago reference style has this awful usage of 3em dash to repeat one or several, or all, authors of the previous reference. Although this practice seems to be removed or...
Unstable and work in progress! (follow-up of the `fix-vector-graphics` branch) This is a working version for a revision of the cascade process in Grobid, which changes the overall approach for...
The average time spent by FastMatcher (around 15% of the whole runtime) is particularly important for the journal names (11.3%). There are apparently too many short abbreviated concurrent journal names...
From extracted affiliation by Grobid, it would be interesting to try to validate/correct affiliation strings from the affiliation information possibly present in CrossRef records. In addition, when ROR are available...