Patrice Lopez
Patrice Lopez
Hi @hallsten ! Thank you for the issue. The soft hyphens should not be omitted yes, but normally they should not be visible except at the end of a line....
Yes I have the same, it's an external dependency, so we can't do anything.
Hi @MatthieuMoullecDev ! This client takes indeed a directory as input/output, as documented, because this is directed to batch processing of many files. For me this client is a basis...
Same as #63, should be fixed as well
Do you know which style can be obtained from the font full name? The way to get italic or bold information is actually very hacky https://github.com/kermitt2/pdfalto/blob/master/src/XmlAltoOutputDev.cc#L598 but it's the normal...
Hi Daniel, For the white image, it's probably the same as what I raised here: https://github.com/kermitt2/grobid/issues/826 I think these are the "Soft-Mask" images of the [PDF specifications](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf) (11.6.5.3 Soft-Mask Images,...
Many thanks @mauvilsa for the issue and the PR ! I've started to review the options. You're absolutely right about the `-blocks` option, it does not make sense to have...
So I plan the following: - create a release for current master to `0.3`, because there were quite a few important fixes lately (in particular #54) and some additions like...
Thank you @giancarlobi ! Indeed the `xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"` is required for the `xsi` namespace (it looks obvious when I look at it now, but you known xml...). ``` > xmllint --schema...
Duplicate of #109 ? Normally the rotation information are still calculated and just need to be serialized properly in ALTO.