Daniel Ecer
Daniel Ecer
> The library requires Tensorflow and I don't have access to a Mac for testing Apple-specific setups. I'm not comfortable with making hard dependencies optional, especially since Python packaging is...
Is that with re-trained models?
I was thinking whether maybe the "bug" was providing a proxy for superscript or subscript.
Just a thought: Maybe the superscript and subscript font style aren't always detected as well. Whereas the font size is always available, and so maybe this incorrect calculation happens to...
Hi Patrice, thank you for your response. > This is easy to fix and get it a bit more general, we simply need to use the average/median line space separation...
BTW The similar area of code (`assignGraphicObjectsToFigures`: https://github.com/kermitt2/grobid/blob/0.6.1/grobid-core/src/main/java/org/grobid/core/document/Document.java#L1153-L1159) seems to also be responsible for, in my case, include the label / header in the description. I haven't yet reproduced that...
Just to get an idea what difference this can make, an evaluation over the bioRxiv 10k validation dataset (first 200 samples):  The last two bars are using the same...
I guess one counter example, where rules or an additional model input could help: `475335v1` (DOI: [10.1101/475335](https://doi.org/10.1101/475335)) (bioRxiv 10k train example) PDF  bioRxiv JATS XML ```xml Epidemic synchrony and...
For what it's worth, in the bioRxiv dataset there doesn't seem to be a separate title as far as I am aware. I am just evaluating `label` and `caption` /...
I could look into adding something to the training data... In general I prefer more flexibility in terms of data being used by the model, closer to the model itself....