callr icon indicating copy to clipboard operation
callr copied to clipboard

r_bg process stop after some time as session reload or closed

Open sm2607 opened this issue 1 year ago • 8 comments

Hi, I am using this callr package and its pretty cool for async process for rshiny. Actually I am performing below task in nutshell:- i. user launch analysis, ii. analysis parameter saved in database iii. created a background process using r_bg (which takes around 2-3 hour)-this also update status in database iv. close browser and checked status after some time by calling database status v. meanwhile another user can launch analysis with similar approach and a seprate r_bg session is created for that too

Its working fine , but sometime process which is running through "r_bg" is closed in middle without an error (I have applied tryCatch to write error if any). Could you please suggest what should I do to stop auto-close of background r-session

sm2607 avatar Aug 30 '22 08:08 sm2607

Can you please try to create a minimal reproducible example (i.e. the actual code that you run) that has this behavior?

gaborcsardi avatar Aug 30 '22 08:08 gaborcsardi

Hi, Please find the minimal reproducible example. Also I am using to rscript.exe /rstudio connect to deploy this app and getting this issue in both. is this due to catch memory or anything else?

status.csv

library(shiny)
my_func<-function(arg,jobid){
  jobstatus<-read.csv('status.csv')
  job<-which(jobstatus$jobid==jobid)
  remain_process<-length(which(!(jobstatus$jobid[-c(job)] %in% (c("Completed","Analysis Failed")))))
  
  while(remain_process>0){
    
    jobstatus$status[job]<- 'Queued'
    write.csv(jobstatus,'status.csv',row.names = F)
    print('waiting for other Analysis')
    Sys.sleep(10)
    jobstatus<-read.csv('status.csv')
    job<-which(jobstatus$jobid==jobid)
    remain_process<-length(which(!(jobstatus$jobid[-c(job)] %in% (c("Completed","Analysis Failed")))))
  }
  jobstatus<-read.csv('status.csv')
  job<-which(jobstatus$jobid==jobid)
  jobstatus$status[job]<- 'Analysis started'
  write.csv(jobstatus,'status.csv',row.names = F)
  
  #some long running task
  Sys.sleep(10000)
  jobstatus<-read.csv('status.csv')
  job<-which(jobstatus$jobid==jobid)
  jobstatus$status[job]<- 'Completed'
  write.csv(jobstatus,'status.csv',row.names = F)
  
  
}
ui <- fluidPage(
  textInput('args',"Argument",value = 0),
  actionButton("run","RUN")
)
server <- function(input, output, session) {
  
  observeEvent(input$run,{
    data<-read.csv('status.csv')
    data2<-data.frame("jobid"=1,	"arg"=1,	"status"=1)
    data2$jobid<- 100+nrow(data)+1
    data2$arg<-input$args
    data2$status<- "processing"
    data<-rbind(data,data2)
    write.csv(data,'status.csv',row.names = F)

    df<- callr::r_bg(my_func,args = list(arg= data2$arg[1],jobid=data2$jobid[1]),stdout =data2$jobid[1])
  })
}
shinyApp(ui, server)

sm2607 avatar Aug 30 '22 09:08 sm2607

You are not keeping the handle of the background process (df in your code), so when that is garbage collected callr will kill the background process. You need to keep the df objects as long as the background processes are running.

Btw. another issue with this code is that if you have multiple background processes writing to the same file, then the file can be corrupted or updates might be lost.

gaborcsardi avatar Aug 30 '22 09:08 gaborcsardi

Actually, I also tried to use below code to keep handle on background session, but as this process is "long run process", so when I try to check after refreshing/reloading app, this values got reset. can you please suggest what should I change to handle the background process in inter-session.

library(shiny)
my_func<-function(arg,jobid){
  jobstatus<-read.csv('status.csv')
  job<-which(jobstatus$jobid==jobid)
  remain_process<-length(which(!(jobstatus$jobid[-c(job)] %in% (c("Completed","Analysis Failed")))))
  
  while(remain_process>0){
    
    jobstatus$status[job]<- 'Queued'
    write.csv(jobstatus,'status.csv',row.names = F)
    print('waiting for other Analysis')
    Sys.sleep(10)
    jobstatus<-read.csv('status.csv')
    job<-which(jobstatus$jobid==jobid)
    remain_process<-length(which(!(jobstatus$jobid[-c(job)] %in% (c("Completed","Analysis Failed")))))
  }
  jobstatus<-read.csv('status.csv')
  job<-which(jobstatus$jobid==jobid)
  jobstatus$status[job]<- 'Analysis started'
  write.csv(jobstatus,'status.csv',row.names = F)
  
  #some long running task
  Sys.sleep(10000)
  jobstatus<-read.csv('status.csv')
  job<-which(jobstatus$jobid==jobid)
  jobstatus$status[job]<- 'Completed'
  write.csv(jobstatus,'status.csv',row.names = F)
  
  
}
ui <- fluidPage(
  textInput('args',"Argument",value = 0),
actionButton("run","RUN"),
  textOutput("did_it_work")
)
server <- function(input, output, session) {
  df2<-reactiveValues(df=list(),df_flag=1)
  
 
  
  observeEvent(input$run,{
    data<-read.csv('status.csv')
    data2<-data.frame("jobid"=1,	"arg"=1,	"status"=1)
    data2$jobid<- 100+nrow(data)+1
    data2$arg<-input$args
    data2$status<- "processing"
    data<-rbind(data,data2)
    write.csv(data,'status.csv',row.names = F)

    df2$df[[df2$df_flag]]<- callr::r_bg(my_func,args = list(arg= data2$arg[1],jobid=data2$jobid[1]),stdout =data2$jobid[1])
  df2$df_flag<-df2$df_flag+1
    })
  check <- reactive({  
    if(df2$df_flag>1){
      flag_all<-NULL
      
      for (i in 1:(df2$df_flag-1)){
        flag1<-(df2$df[[i]]$poll_io(0)["process"] == "timeout")
        flag1<-paste0(i,"_",flag1)
        flag_all<-c(flag_all,flag1)
      }
      value<-paste(flag_all,collapse = ',')
      invalidateLater(3000)
    } else {
      value<-NULL
    }
    return(value)
  })
  output$did_it_work <- renderText({
    check()
  })
}
shinyApp(ui, server)

also, to prevent crashing I am writing data in database instead csv.

sm2607 avatar Aug 30 '22 10:08 sm2607

I am sorry, I cannot advise you on your shiny app.

You can keep callr background processes alive between sessions with setting cleanup = FALSE. But note that then it is your responsibility to clean them up:

❯ p1 <- callr::r_bg(function() Sys.sleep(10000))
❯ ps::ps_children()
[[1]]
<ps::ps_handle> PID=30049, NAME=R, AT=2022-08-30 10:08:58

❯ rm(p1); gc(); gc()
❯ ps::ps_children()
list()
❯ p2 <- callr::r_bg(function() Sys.sleep(10000), cleanup = FALSE)
❯ ps::ps_children()
[[1]]
<ps::ps_handle> PID=30216, NAME=R, AT=2022-08-30 10:09:45

❯ rm(p2); gc(); gc()
❯ ps::ps_children()
[[1]]
<ps::ps_handle> PID=30216, NAME=R, AT=2022-08-30 10:09:45

I also suspect that Connect will suspend the container running your app after a time of inactivity, killing all processes, but this is potentially something you can configure in Connect.

gaborcsardi avatar Aug 30 '22 10:08 gaborcsardi

ok thanks for your suggestions. I will try both and hopefully it will work. Also, as you suggested, I have to clean them up, can you suggest any approach/method/function, which can be used to clean this up after processing complete

sm2607 avatar Aug 30 '22 10:08 sm2607

After they run their function they will quit. You "only" need to clean them up explicitly if you restart or stop the app, or something similar.

To clean a callr process in another session, you need to save its pid and start time, and then use the ps package:

p2 <- callr::r_bg(function() Sys.sleep(10000), cleanup = FALSE)
pid <- p2$get_pid()
tim <- p2$get_start_time()
rm(p2)
# still running
ps::ps_children()
#> [[1]]
#> <ps::ps_handle> PID=31657, NAME=R, AT=2022-08-30 11:21:28

# new handle
p2x <- ps::ps_handle(pid, tim)
p2x
#> <ps::ps_handle> PID=31657, NAME=R, AT=2022-08-30 11:21:28

ps::ps_kill(p2x)
ps::ps_children()
#> list()

ps_kill() errors if the process has already quit before creating the p2x handle:

ps::ps_kill(ps::ps_handle(100L, tim))
#> Error: No such process, pid 100, ???

gaborcsardi avatar Aug 30 '22 11:08 gaborcsardi

hi thanks, cleanup command is working as expected. Although, I am not able to clean with pid and time info. For cleaning it through other session, I tried below approach,

p2x <- ps::ps_handle(pid, tim) p2x #ps::ps_handle PID=20432, NAME=Rterm.exe, AT=2022-08-31 05:13:13 ps::ps_kill(p2x) #< Error: No such process, pid 20432, ???

whereas, this process still active in first session:- PROCESS 'Rterm', running, pid 20432

sm2607 avatar Aug 31 '22 05:08 sm2607

I don't know why the time stamp would not work, it could be a bug in ps. Nevertheless, then you can just use the pid, with ps_kill(ps_handle(pid)).

gaborcsardi avatar Oct 28 '22 08:10 gaborcsardi