tabulapdf icon indicating copy to clipboard operation
tabulapdf copied to clipboard

Integrating extract_tables with Shiny-app - no reactivity

Open Aeilert opened this issue 8 years ago • 12 comments

Thanks for this awesome package. It works well on all the .pdf-documents I have tried it on. I do however have a problem integrating the extract_tables / extract_text functions with my own Shiny-app.

More specifically the problem is that the fileInput-function to upload files doesn't seem to recognize that a new file has been uploaded. This works instantly with other R-functions like read.csv or pdf_text in the pdftools-library.

This works with pdftools :

library(pdftools)
shinyServer(function(input, output) {
    output$contents <- renderText({

        inFile <- input$file1

        if (is.null(inFile))
            return(NULL)
        pdf_text(inFile$datapath)
    })
})

This doesn't work with tabulizer :

library(shiny);library(tabulizer)
shinyServer(function(input, output) {
    output$contents <- renderText({#renderTable

        inFile <- input$file1

        if (is.null(inFile))
            return(NULL)
       extract_text(inFile$datapath)
        #extract_tables(inFile$datapath)[[1]]
        #read.csv(inFile$datapath, header=input$header, sep=input$sep, 
        #         quote=input$quote)
    })
})

ui.R is the same in both cases.

shinyUI(fluidPage(
    titlePanel("Uploading Files"),
    sidebarLayout(
        sidebarPanel(
            fileInput('file1', 'Choose PDF File',
                      accept=c('.pdf'))#,c("application/pdf","adobe-portable-document-format",".pdf"))
        ),
        mainPanel(
            tableOutput('contents')
        )
    )
))

I'm working on a MacPro with OS X 10.11.4 R 3.2.3 RStudio Version 0.99.887

Aeilert avatar Jul 05 '16 12:07 Aeilert

Okay, I assume the issue is something to do with how tabulizer runs the file through localize_file(). Do you know what your shiny app gives as the value of inFile$datapath?

leeper avatar Jul 05 '16 14:07 leeper

I see.

inFile$datapath gives a temporary folder of the form "/var/folders/v4/gmb8wwgx1jj_94hdm2rc4bjh0000gn/T//RtmpYGRbyk/ced0c48b4af993d4d1ca7da4/0". Interestingly this path seem to change when a new file is uploaded. The other values (name, size) in the input$file1 / inFile - dataframe are also updated, but the output displayed is not.

I have also tried different uses of reactive and reactiveValues-statements to force a resetting of the cache, but have not succeeded so far.

And cudos for the swift reply.

Aeilert avatar Jul 05 '16 16:07 Aeilert

Do you have any suggestions or workarounds to this problem?

If the app only needs to function RStudio you could use the base R-command file.choose instead of fileInput. But I'm interested in a browser-app.

Aeilert avatar Aug 02 '16 10:08 Aeilert

I experience the same issue. When uploading a file, Shiny will write it to subdirectory below a temporary directory. load_doc will call localize_file with copy = TRUE, which will make a copy of the uploaded file. In later calls, this file never seems to get replaced.

I cloned the repo and ran with some edits locally. If I change the copy parameter in load_doc from TRUE to FALSE (line 23 of utils.R), everything works as expected.

I'd submit a pull request, but I'm not 100% clear on the mechanism which produces the error, nor the purpose of COPY in localize_file.

PirateGrunt avatar Feb 07 '17 02:02 PirateGrunt

A bit more on this. I think the more significant issue is with regard to line 13 in utils.R. The default behavior for file.copy is NOT to overwrite the file if it already exists. The way that Shiny uploads files is such that they will always exist in a subfolder of tempdir() and will have a basename of "0" or some such. localize_file will take its source and copy it to tempdir(). On the second call, there is already a file with a name like "0.pdf" at that location. Because we're not passing in overwrite = TRUE, the copy never takes place. This is a very easy change, that I've tested for my specific use case. I'll go ahead and submit a pull request.

PirateGrunt avatar Feb 07 '17 15:02 PirateGrunt

While using extract_areas in a shiny app, it is giving an Error: Can't call runApp() from within runApp(). If your application code contains runApp(), please remove it. I understand, extract_areas is built as a shiny based functionality but I want to upload PDFs using shing InputFile and id pages to locate areas and extract. Is there simple way to get the same? Thanks in advance

ghost avatar Jun 14 '17 11:06 ghost

@abhivedula You can't use extract_areas() in a shiny app because it is a shiny app. If you want to use the underlying functionality, pass input to extract_tables().

leeper avatar Jun 14 '17 21:06 leeper

Thanks for the reply Leeper. I have an additional question. Can we develop a customized version extract_areas as shiny app to host online ?

ghost avatar Jun 15 '17 11:06 ghost

You're more than welcome to. This package is licensed MIT, so as long as you comply with that you should be good to go.

leeper avatar Jun 15 '17 11:06 leeper

Sorry to bother you. One last question on this topic. I am developing a shiny app to host in shinyapps.io. It should have a file upload feature and once we upload PDF, it should give option to select areas for extract tables just like the extract_areas feature which does that in viewer. I am relatively new to R so please bear with my ignorance.

ghost avatar Jun 15 '17 11:06 ghost

I am not a shiny expert so I really can't provide any assistance with that.

leeper avatar Jun 15 '17 12:06 leeper

Thanks a lot leeper. Btw tabulizer is awesome :)

ghost avatar Jun 15 '17 12:06 ghost