tabulapdf
tabulapdf copied to clipboard
Minus signs not appearing in extracted table
Please specify whether your issue is about:
- [X] a possible bug
- [ ] a question about package functionality
- [X] a suggested code or documentation change, improvement to the code, or feature request
The problem concerns tables found in some journal articles, such as page 5 from this paper.
extract_tables(file="schaefer & steklis am j primatol 2014.pdf",pages=5)
leads to the table being extracted, but all of the values in the table are positive. This appears to be because the journal used a dash (^D in the pdf file) and not a minus.
rJava loads with no errors.
sessionInfo() is as follows:
sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=en_GB.UTF-8
[9] LC_ADDRESS=en_GB.UTF-8 LC_TELEPHONE=en_GB.UTF-8
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=en_GB.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] tabulizer_0.2.2 rJava_0.9-10
loaded via a namespace (and not attached):
[1] tabulizerjars_1.0.1 compiler_3.5.2 tools_3.5.2
[4] png_0.1-7
Same issue here, when clicking on the pdf button and extracting TABLE 2 on page 637 (page number in the top right).
This is really dangerous because the dash (i.e. the "minus sign") is just converted to a "2", which you don't notice so easily unless you check every column of the table afterwards.
Did you find a solution/workaround?
Hi there,
I haven't found a solution. I needed to use tabulizer just once, so I didn't try.
I just thought it was serious enough to flag.
Best,
Alex
From: Markus Dumke [email protected] Sent: 12 February 2020 21:43:34 To: ropensci/tabulizer Cc: WEISS ALEXANDER; Author Subject: Re: [ropensci/tabulizer] Minus signs not appearing in extracted table (#104)
Same issue herehttps://academic.oup.com/ajcn/article/91/3/635/4597164, when clicking on the pdf button and extracting TABLE 2 on page 637 (page number in the top right).
This is really dangerous because the dash (i.e. the "minus sign") is just converted to a "2", which you don't notice so easily unless you check every column of the table afterwards.
Did you find a solution/workaround?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/tabulizer/issues/104?email_source=notifications&email_token=ACOCXMUILWCXVAKUZPZSKOLRCPVHNA5CNFSM4G2TFHVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELQUFPY#issuecomment-585188031, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACOCXMUAWIFY52C577UWZITRCPVHNANCNFSM4G2TFHVA. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.