pdfalto
pdfalto copied to clipboard
Negative coordinates
.
Hi @deepseek !
Thank you for the issues and the very clear example.
I have been able to reproduce negative coordinates indeed for the TextBlock
element on your example. TextLine
in the block are correct, I will review how these block coordinates are calculated, probably simply an absolute value missing.
is this problem solved? I still facing block width and height with negative value
Hi @yueyub !
I think it was solved, do you have a reproducible example ? It would help a lot.
It's this paper:, https://arxiv.org/abs/2207.04630, but I think it may apply to any arxiv paper.
After extract xml file,
search "arXiv:2207.04630v2", it's on the first page, the block coordinates is :
<TextBlock ID="p1_b17" HPOS="18.3400" VPOS="443.790" HEIGHT="-443.79" WIDTH ="-18.340">
How to understand the negative height and width?
On Wed, Aug 3, 2022 at 2:48 AM Patrice Lopez @.***> wrote:
Hi @yueyub https://github.com/yueyub !
I think it was solved, do you have a reproducible example ? It would help a lot.
— Reply to this email directly, view it on GitHub https://github.com/kermitt2/pdfalto/issues/80#issuecomment-1203090116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB22OIRTPTS2U6W4IO6YZXDVXFUOXANCNFSM4JMAXZKA . You are receiving this because you were mentioned.Message ID: @.***>
Thank you very much @yueyub for the error case. I think it's just a bug (a missing absolute value for the TextBlock element only) - I reproduce it with your example for the @WIDTH
and I will fix it.