Marcel Bollmann
Marcel Bollmann
FWIW, I don't think attaching the P16 abstracts to your issue worked, @abhinavkashyap
Re the hyphenation, I browsed through P16 and my approach currently fails for some words that `wordfreq` apparently doesn't know about: ``` annota-tors corefer-ence geospa-tional la-belers reg-ularizer rerank-ing sum-marizer system-aticity...
Thanks @abhinavkashyap! They generally look very good to me. I compared them with my own Tika pipeline, and they're mostly identical, and also appear to have the same problems; e.g.,...
Yes, I was about to write the same thing while you were posting this, @akoehn. :) Funnily enough, even the currently generated nonsense link on the website resolves: - http://dx.doi.org/http://hdl.handle.net/2065/29098...
Making a new field would be very little work, it just produces extra code for what currently is a rare exception. The `` way would also work, but I find...
### Caching I've tried adding serialization functionality to the Anthology class, so it could potentially be instantiated from a cached file instead of loading from the XML/YAML. This adds quite...
Some more thoughts (mainly for myself :)) and investigation: **As a first step**, I now think the most promising route is to start optimizing the code we already have; profiling...
Here's an updated report from pyinstrument after merging #1473 (caveat: I don't remember which device I ran the previous test on :)):  Besides already looking much faster, I suspect...
If you can send me an updated file list, I'll update this one. Might make sense to first figure out which problems are actual problems.
Updated the list. This and the recent corrections (including my unmerged ones in #267) remove about 54 files from the list.