Patrice Lopez
Patrice Lopez
Nice work, thanks ! Using GloVe embeddings as indicated and increasing the number of epochs to 70 without touching anything else, I obtained a f-score of 89.16 averaged over 10...
An error case for accent composition in pdfalto, see https://github.com/kermitt2/grobid/issues/906 for the pdf.
By default pdfalto extracts both embedded bitmaps and vector graphics. The option -noImage avoids extracting both graphics types. However we might want still the vector graphics extracted and not the...
We use currently simple formatting patterns like `%1.4f` to serialize the coordinates in the XML and SVG files (avoiding `e` formatting that can introduce an exponential). The drawback is that...
For some reason, the rotation attribute which was present in pdf2xml and which is still computed, is not outputted in the ALTO file presently. If I remember well, we though...
> It would be great if you would consider option to include Glyph/Character level in the output. For the moment only token-level output is implemented.
The quantity CRF model recognizes numerical expressions with exponents on 10 (in particular distorted one due to PDF text extraction): ![example_exponent](https://cloud.githubusercontent.com/assets/2340795/13862621/c63b7d90-ec94-11e5-8c5b-9f70a2385489.png) However we are not currently parsing it (in their...
In this example, the raw value looks good but the parsed value is not very exciting. ![Screenshot from 2019-11-26 22-32-41](https://user-images.githubusercontent.com/2340795/69675821-d786c680-109f-11ea-8d40-48271692d192.png) [1001._0908.0054.pdf](https://github.com/kermitt2/grobid-quantities/files/3894162/1001._0908.0054.pdf)
Values can be entirely numerical, use exponent of 10s (see #7) or exponent symbol (0.2E-4), number words ("twenty") (see #8), dates/time expressions ("October 19, 2014 at 20:09 TDB") (see #12)....
e.g. **silicon nitride powder** for the measurement **10kg** in: ``` A mixture of 10kg of _silicon nitride powder_ was charged into the mixing chamber 20 of the mixing vessel 18....