httr2 icon indicating copy to clipboard operation
httr2 copied to clipboard

Error: Cannot find progress bar (req_perform_parallel)

Open JBGruber opened this issue 1 year ago • 18 comments

When I run a req_perform_parallel with a progress bar and abort the operation, every new call to req_perform_parallel will error until the R session is restarted. Here is a minimal example:

library(httr2)
reqs <- list(
  request(example_url()) |>
    req_url_path("/delay") |> 
    req_url_path_append(sample(1:10, 1))
) |>
  rep(100)

resps <- req_perform_parallel(reqs, progress = TRUE)

### stop with Esc ##
resps <- req_perform_parallel(reqs, progress = TRUE)
#> Error in cli::cli_progress_update(..., id = id) : 
#>   Cannot find progress bar `cli-25573-7`
#> Error in cli::cli_progress_update(..., id = id) : 
#>   Cannot find progress bar `cli-25573-7`

This does not apply to req_perform_sequential.

JBGruber avatar Dec 11 '24 17:12 JBGruber

Hmmm create_progress_bar() should probably add an exit handler to the calling environment that automatically closes the progress bar. But I'm a bit surprised that's necessary because I don't understand why it would try and reuse the previous progress bar.

hadley avatar Dec 11 '24 17:12 hadley

I also have no idea. And here is some more weirdness. I just noticed that it does not happen every time! If I hit Esc too fast, it does not seem to reuse the id???

Screencast_20241211_183723.webm

JBGruber avatar Dec 11 '24 17:12 JBGruber

@gaborcsardi any idea what's going on here? The progress bar is created in a helper function: https://github.com/r-lib/httr2/blob/e972770199f674eca4c64ca8161235e5745683dd/R/utils.R#L242-L284

Maybe something is going wrong with how I set .envir?

hadley avatar Dec 11 '24 17:12 hadley

I am not sure, but my guess is that the progress bar's env is already removed, but the curl multi-handle is still alive and you get a final update() call for it.

gaborcsardi avatar Dec 11 '24 19:12 gaborcsardi

Ooooh I bet that's it.

hadley avatar Dec 11 '24 20:12 hadley

I think we can fix this by using a separate pool for each req_perform_parallel() call, rather than relying on the default global pool.

hadley avatar Dec 11 '24 22:12 hadley

I think you can also tryCatch() and ignore the error.

gaborcsardi avatar Dec 11 '24 22:12 gaborcsardi

I can't reliably reproduce this. I tried what I thought would illustrate it more reliably:

library(httr2)
options(cli.progress_show_after = 0)

reqs <- list(
  request(example_url()) |> req_url_path("/delay/10"),
  request(example_url()) |> req_url_path("/delay/1"),
  request(example_url()) |> req_url_path("/delay/10"),
  request(example_url()) |> req_url_path("/delay/1"),
  request(example_url()) |> req_url_path("/delay/10"),
  request(example_url()) |> req_url_path("/delay/1")
) 
resps <- req_perform_parallel(reqs, progress = TRUE)

But I only manage to see the error once, so I suspect it might be more of a problem on windows.

hadley avatar Dec 20 '24 20:12 hadley

I'm on Arch Linux (BTW)

It only seems to happen when I interrupt the process after a couple of seconds and not every time. Which makes this the weirdest behaviour that I've experienced so far in R.

Can you start another process after you see the error once? I need to restart R.

JBGruber avatar Dec 20 '24 20:12 JBGruber

But my experiments suggest that there might be a bigger problem — I think terminating req_perform_parallel() doesn't actually terminate the other ongoing requests.

...

Hmmmm, some experiments suggest that maybe that's not the case?

hadley avatar Dec 20 '24 20:12 hadley

@JBGruber could you please try #602 and see if it fixes the problem for you?

hadley avatar Dec 20 '24 21:12 hadley

This fixes the error :blush:. But I observed some other unexpected behaviour.

  1. After interrupting, I get ! Operation timed out after 10000 milliseconds with 0 bytes received on the next attempt when that shouldn't be the case. I decreased the delay to 2 seconds on the second attempt to make sure this isn't a coincidence. If the delay of first + second attempt is below 10 seconds, it does not seem to happen.
  2. If I wait until 50% requests are done and then interrupt, the return object is still produced, with half of responses missing (if I wait until 80%, 20% are empty and so on)

Sorry for the screen recording, but I don't know how else to show it. Until second 46, you see the first issue. I wait a little bit and then show issue 2 from secon 56.

httr2-pr-behaviour.webm

I think this is all far less problematic than having to restart R to be able to run req_perform_parallel() with a progress bar. But I think it suggests that you might have been right and terminating req_perform_parallel() doesn't actually terminate the other ongoing requests.

JBGruber avatar Dec 21 '24 09:12 JBGruber

@JBGruber I did a bunch of experiments and convinced myself that they're getting cancelled on my machine, but maybe that code path doesn't work correctly on linux? I'll think about how to test that hypothesis.

hadley avatar Dec 21 '24 13:12 hadley

@gaborcsardi any ideas on why this might be different on Arch?

hadley avatar Jan 06 '25 13:01 hadley

Different libcurl version or settings, potentially. But it can also just be random.

gaborcsardi avatar Jan 06 '25 13:01 gaborcsardi

curl::curl_version()
#> $version
#> [1] "8.11.1"
#> 
#> $headers
#> [1] "8.11.0"
#> 
#> $ssl_version
#> [1] "OpenSSL/3.4.0"
#> 
#> $libz_version
#> [1] "1.3.1"
#> 
#> $libssh_version
#> [1] "libssh2/1.11.1"
#> 
#> $libidn_version
#> [1] "2.3.7"
#> 
#> $host
#> [1] "x86_64-pc-linux-gnu"
#> 
#> $protocols
#>  [1] "dict"    "file"    "ftp"     "ftps"    "gopher"  "gophers" "http"   
#>  [8] "https"   "imap"    "imaps"   "mqtt"    "pop3"    "pop3s"   "rtsp"   
#> [15] "scp"     "sftp"    "smb"     "smbs"    "smtp"    "smtps"   "telnet" 
#> [22] "tftp"    "ws"      "wss"    
#> 
#> $ipv6
#> [1] TRUE
#> 
#> $http2
#> [1] TRUE
#> 
#> $idn
#> [1] TRUE
#> 
#> $url_parser
#> [1] TRUE

Created on 2025-01-06 with reprex v2.1.1

JBGruber avatar Jan 06 '25 13:01 JBGruber

One particular difference could be HTTP/1.1 vs HTTP/2, i.e. whether libcurl has HTTP/2 support and whether it is used by default. AFAIR cancellation is very different for HTTP/2, because of the multiplexing.

gaborcsardi avatar Jan 06 '25 13:01 gaborcsardi

Since I can't reproduce the problem easily, and I need to get this release out of the door, I'm going to push this to the future.

hadley avatar Jan 07 '25 23:01 hadley