tabulapdf icon indicating copy to clipboard operation
tabulapdf copied to clipboard

Possible Bug: converting paranthesis "(" into minus sign

Open Munir-shah opened this issue 7 years ago • 1 comments

I am trying to extract table from a PDF file using tabulizer. it is running fine and does extract tables. However, my table has parenthesis around numbers in a table, for example, (20,076) and tabulizer is interpreting "(" as minus sign and extracted table has -20,076 ( a negative number. Can any body help me why is it doing this and what could be a solution to address this problem.

Code here: library(tabulizer) location <- "ccc.pdf" out <- extract_tables(location, output = "csv")

Input File and extracted Table ExtractedTable.xlsx tabulizer-issue-minus-sign

Munir-shah avatar Apr 30 '18 03:04 Munir-shah

That's strange. It's unlikely to be a tabulizer issue, the problem probably lies somewhere upstream. Either Tabula or even the pdf file itself. What happens if you copy-paste these figures in some pdf viewer? Could you provide an example pdf file?

tpaskhalis avatar May 11 '18 15:05 tpaskhalis