Daniel Ecer comments

Results 115 comments of


                                            Daniel Ecer

Error installing "Apple Silicon specific dependencies"

> The library requires Tensorflow and I don't have access to a Mac for testing Apple-specific setups. I'm not comfortable with making hard dependencies optional, especially since Python packaging is...

Header model, relative font size includes spaces with a zero font size

Is that with re-trained models?

Header model, relative font size includes spaces with a zero font size

I was thinking whether maybe the "bug" was providing a proxy for superscript or subscript.

Header model, relative font size includes spaces with a zero font size

Just a thought: Maybe the superscript and subscript font style aren't always detected as well. Whereas the font size is always available, and so maybe this incorrect calculation happens to...

Only first line of figure description extracted if distance between lines deemed too large

Hi Patrice, thank you for your response. > This is easy to fix and get it a bit more general, we simply need to use the average/median line space separation...

Only first line of figure description extracted if distance between lines deemed too large

BTW The similar area of code (`assignGraphicObjectsToFigures`: https://github.com/kermitt2/grobid/blob/0.6.1/grobid-core/src/main/java/org/grobid/core/document/Document.java#L1153-L1159) seems to also be responsible for, in my case, include the label / header in the description. I haven't yet reproduced that...

Only first line of figure description extracted if distance between lines deemed too large

Just to get an idea what difference this can make, an evaluation over the bioRxiv 10k validation dataset (first 200 samples): ![image](https://user-images.githubusercontent.com/1016473/104202071-5d8bdb00-5422-11eb-82a7-2eaf7f110a2a.png) The last two bars are using the same...

Only first line of figure description extracted if distance between lines deemed too large

I guess one counter example, where rules or an additional model input could help: `475335v1` (DOI: [10.1101/475335](https://doi.org/10.1101/475335)) (bioRxiv 10k train example) PDF ![image](https://user-images.githubusercontent.com/1016473/104206055-04727600-5427-11eb-95e1-e4540fe228c7.png) bioRxiv JATS XML ```xml Epidemic synchrony and...

Only first line of figure description extracted if distance between lines deemed too large

For what it's worth, in the bioRxiv dataset there doesn't seem to be a separate title as far as I am aware. I am just evaluating `label` and `caption` /...

Only first line of figure description extracted if distance between lines deemed too large

I could look into adding something to the training data... In general I prefer more flexibility in terms of data being used by the model, closer to the model itself....