bnewbold
bnewbold
Here is what I get when I tried to register a brand new `QUADERNO A4 (Gen. 2)` (`FMVDP41`). Note that I have redacted the serial number, and added a single...
In our use of GROBID, we have machines with a reasonable number of cores and RAM (eg, 30 cores, 40GB RAM), but poor disk I/O. This makes it important to...
The current arxiv.org identifier matcher regex requires an "arXiv:" prefix. This misses some un-ambiguous old-style identifiers in short citations, like these examples; ``` B.A. Dobrescu, hep-ph/9510424. K.R. Dienes, C. Kolda...
Opening a tracking ticket to discuss this security issue (`CVE-2021-44228`) in the context of GROBID. I'm not even sure GROBID is impacted, but figure it would be good to document...
This is a pretty obscure corner case (parsing a mangled string), so may not be a priority to reproduce and fix. But it did come up in real use of...
Tests are still failing in this branch. There is partial progress in the form of regression tests, and handling of some NullPointerExceptions. Related to: https://github.com/kermitt2/grobid/issues/849
Histograms could be done in numerous ways. Here are some thoughts: - like most of xsv, should operate over huge tables with a single pass - .idx files could store...
This elasticsearch blog post implies that doing batch indexing of documents all going to the same shard at a time improves performance: https://www.elastic.co/blog/how-kenna-security-speeds-up-elasticsearch-indexing-at-scale-part-1 The feature request for esbulk would be...
I twice attempted to import over 140 million documents into a local, single-node ES 6.8 cluster using a command like the following: ``` zcat /srv/fatcat/snapshots/release_export_expanded.json.gz | pv -l | parallel...
If I try `ia upload SOME-ITEM .` when there is a file with a dot in the current directory, I get: error uploading /./RDS_ios.iso.sha: Invalid Argument - when key is...