tabulapdf icon indicating copy to clipboard operation
tabulapdf copied to clipboard

Minus signs not appearing in extracted table

Open alexweissuk opened this issue 5 years ago • 2 comments

Please specify whether your issue is about:

  • [X] a possible bug
  • [ ] a question about package functionality
  • [X] a suggested code or documentation change, improvement to the code, or feature request

The problem concerns tables found in some journal articles, such as page 5 from this paper.

extract_tables(file="schaefer & steklis am j primatol 2014.pdf",pages=5)

leads to the table being extracted, but all of the values in the table are positive. This appears to be because the journal used a dash (^D in the pdf file) and not a minus.

rJava loads with no errors.

sessionInfo() is as follows:

sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=en_GB.UTF-8
[9] LC_ADDRESS=en_GB.UTF-8 LC_TELEPHONE=en_GB.UTF-8
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=en_GB.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tabulizer_0.2.2 rJava_0.9-10

loaded via a namespace (and not attached): [1] tabulizerjars_1.0.1 compiler_3.5.2 tools_3.5.2
[4] png_0.1-7

alexweissuk avatar Feb 27 '19 15:02 alexweissuk

Same issue here, when clicking on the pdf button and extracting TABLE 2 on page 637 (page number in the top right).

This is really dangerous because the dash (i.e. the "minus sign") is just converted to a "2", which you don't notice so easily unless you check every column of the table afterwards.

Did you find a solution/workaround?

markusdumke avatar Feb 12 '20 12:02 markusdumke

Hi there,

I haven't found a solution. I needed to use tabulizer just once, so I didn't try.

I just thought it was serious enough to flag.

Best,

Alex


From: Markus Dumke [email protected] Sent: 12 February 2020 21:43:34 To: ropensci/tabulizer Cc: WEISS ALEXANDER; Author Subject: Re: [ropensci/tabulizer] Minus signs not appearing in extracted table (#104)

Same issue herehttps://academic.oup.com/ajcn/article/91/3/635/4597164, when clicking on the pdf button and extracting TABLE 2 on page 637 (page number in the top right).

This is really dangerous because the dash (i.e. the "minus sign") is just converted to a "2", which you don't notice so easily unless you check every column of the table afterwards.

Did you find a solution/workaround?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/tabulizer/issues/104?email_source=notifications&email_token=ACOCXMUILWCXVAKUZPZSKOLRCPVHNA5CNFSM4G2TFHVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELQUFPY#issuecomment-585188031, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACOCXMUAWIFY52C577UWZITRCPVHNANCNFSM4G2TFHVA. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

alexweissuk avatar Feb 13 '20 12:02 alexweissuk