tabulapdf
tabulapdf copied to clipboard
Integrating extract_tables with Shiny-app - no reactivity
Thanks for this awesome package. It works well on all the .pdf-documents I have tried it on. I do however have a problem integrating the extract_tables / extract_text functions with my own Shiny-app.
More specifically the problem is that the fileInput
-function to upload files doesn't seem to recognize that a new file has been uploaded. This works instantly with other R-functions like read.csv
or pdf_text
in the pdftools
-library.
This works with pdftools
:
library(pdftools)
shinyServer(function(input, output) {
output$contents <- renderText({
inFile <- input$file1
if (is.null(inFile))
return(NULL)
pdf_text(inFile$datapath)
})
})
This doesn't work with tabulizer
:
library(shiny);library(tabulizer)
shinyServer(function(input, output) {
output$contents <- renderText({#renderTable
inFile <- input$file1
if (is.null(inFile))
return(NULL)
extract_text(inFile$datapath)
#extract_tables(inFile$datapath)[[1]]
#read.csv(inFile$datapath, header=input$header, sep=input$sep,
# quote=input$quote)
})
})
ui.R is the same in both cases.
shinyUI(fluidPage(
titlePanel("Uploading Files"),
sidebarLayout(
sidebarPanel(
fileInput('file1', 'Choose PDF File',
accept=c('.pdf'))#,c("application/pdf","adobe-portable-document-format",".pdf"))
),
mainPanel(
tableOutput('contents')
)
)
))
I'm working on a MacPro with OS X 10.11.4 R 3.2.3 RStudio Version 0.99.887
Okay, I assume the issue is something to do with how tabulizer runs the file through localize_file()
. Do you know what your shiny app gives as the value of inFile$datapath
?
I see.
inFile$datapath
gives a temporary folder of the form "/var/folders/v4/gmb8wwgx1jj_94hdm2rc4bjh0000gn/T//RtmpYGRbyk/ced0c48b4af993d4d1ca7da4/0". Interestingly this path seem to change when a new file is uploaded. The other values (name, size) in the input$file1
/ inFile
- dataframe are also updated, but the output displayed is not.
I have also tried different uses of reactive
and reactiveValues
-statements to force a resetting of the cache, but have not succeeded so far.
And cudos for the swift reply.
Do you have any suggestions or workarounds to this problem?
If the app only needs to function RStudio you could use the base R-command file.choose
instead of fileInput
. But I'm interested in a browser-app.
I experience the same issue. When uploading a file, Shiny will write it to subdirectory below a temporary directory. load_doc
will call localize_file
with copy = TRUE
, which will make a copy of the uploaded file. In later calls, this file never seems to get replaced.
I cloned the repo and ran with some edits locally. If I change the copy
parameter in load_doc
from TRUE to FALSE (line 23 of utils.R), everything works as expected.
I'd submit a pull request, but I'm not 100% clear on the mechanism which produces the error, nor the purpose of COPY
in localize_file
.
A bit more on this. I think the more significant issue is with regard to line 13 in utils.R. The default behavior for file.copy
is NOT to overwrite the file if it already exists. The way that Shiny uploads files is such that they will always exist in a subfolder of tempdir()
and will have a basename of "0" or some such. localize_file
will take its source and copy it to tempdir()
. On the second call, there is already a file with a name like "0.pdf" at that location. Because we're not passing in overwrite = TRUE
, the copy never takes place. This is a very easy change, that I've tested for my specific use case. I'll go ahead and submit a pull request.
While using extract_areas in a shiny app, it is giving an Error: Can't call runApp()
from within runApp()
. If your application code contains runApp()
, please remove it.
I understand, extract_areas is built as a shiny based functionality but I want to upload PDFs using shing InputFile and id pages to locate areas and extract.
Is there simple way to get the same?
Thanks in advance
@abhivedula You can't use extract_areas()
in a shiny app because it is a shiny app. If you want to use the underlying functionality, pass input to extract_tables()
.
Thanks for the reply Leeper. I have an additional question. Can we develop a customized version extract_areas as shiny app to host online ?
You're more than welcome to. This package is licensed MIT, so as long as you comply with that you should be good to go.
Sorry to bother you. One last question on this topic. I am developing a shiny app to host in shinyapps.io. It should have a file upload feature and once we upload PDF, it should give option to select areas for extract tables just like the extract_areas feature which does that in viewer. I am relatively new to R so please bear with my ignorance.
I am not a shiny expert so I really can't provide any assistance with that.
Thanks a lot leeper. Btw tabulizer is awesome :)