julia icon indicating copy to clipboard operation
julia copied to clipboard

CI: On `i686-linux-gnu`, the `Downloads` test set is failing frequently (but not always) with `nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=100`

Open DilumAluthge opened this issue 1 year ago • 7 comments

The error seems to be: nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=100

Relevant log snippet:

Error in testset Downloads:
Error During Test at /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:355
  Got exception outside of a @test
  TaskFailedException
  Stacktrace:
    [1] macro expansion
      @ ./experimental.jl:102 [inlined]
    [2] macro expansion
      @ ./timing.jl:394 [inlined]
    [3] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:366 [inlined]
    [4] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Test/src/Test.jl:1388 [inlined]
    [5] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:356 [inlined]
    [6] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Test/src/Test.jl:1388 [inlined]
    [7] top-level scope
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:4
    [8] include
      @ ./Base.jl:424 [inlined]
    [9] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/test/testdefs.jl:24 [inlined]
   [10] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Test/src/Test.jl:1388 [inlined]
   [11] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/test/testdefs.jl:23 [inlined]
   [12] macro expansion
      @ ./timing.jl:475 [inlined]
   [13] runtests(name::String, path::String, isolate::Bool; seed::UInt128)
      @ Main /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/test/testdefs.jl:21
   [14] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, UInt128, Tuple{Symbol}, NamedTuple{(:seed,), Tuple{UInt128}}})
      @ Base ./essentials.jl:810
   [15] (::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}})()
      @ Distributed /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285
   [16] run_work_thunk(thunk::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}}, print_error::Bool)
      @ Distributed /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
   [17] macro expansion
      @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285 [inlined]
   [18] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
      @ Distributed ./task.jl:501
      nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=100
      Stacktrace:
        [1] (::Downloads.var"#9#18"{IOBuffer, Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Bool, String, Bool, Bool})(easy::Downloads.Curl.Easy)
          @ Downloads /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:388
        [2] with_handle(f::Downloads.var"#9#18"{IOBuffer, Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Bool, String, Bool, Bool}, handle::Downloads.Curl.Easy)
          @ Downloads.Curl /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Curl/Curl.jl:88
        [3] #8
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:329 [inlined]
        [4] arg_write(f::Downloads.var"#8#17"{Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Bool, String, Bool, Bool}, arg::IOBuffer)
          @ ArgTools /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/ArgTools/src/ArgTools.jl:134
        [5] #7
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:328 [inlined]
        [6] arg_read
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/ArgTools/src/ArgTools.jl:76 [inlined]
        [7] request(url::String; input::Nothing, output::IOBuffer, method::Nothing, headers::Vector{Pair{String, String}}, timeout::Float64, progress::Nothing, verbose::Bool, debug::Nothing, throw::Bool, downloader::Downloads.Downloader)
          @ Downloads /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:327
        [8] #3
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:233 [inlined]
        [9] arg_write(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Downloads.Downloader, String}, arg::IOBuffer)
          @ ArgTools /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/ArgTools/src/ArgTools.jl:134
       [10] #download#2
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/src/Downloads.jl:232 [inlined]
       [11] (::Main.Test42Main_Downloads.var"#2#3"{Base.Pairs{Symbol, Downloads.Downloader, Tuple{Symbol}, NamedTuple{(:downloader,), Tuple{Downloads.Downloader}}}, String})(output::IOBuffer)
          @ Main.Test42Main_Downloads /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/setup.jl:13
       [12] sprint
          @ ./strings/io.jl:107 [inlined]
       [13] #download_body#1
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/setup.jl:12 [inlined]
       [14] #download_json#4
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/setup.jl:18 [inlined]
       [15] macro expansion
          @ /cache/build/default-amdci5-0/julialang/julia-master/julia-9bdaabdc82/share/julia/stdlib/v1.9/Downloads/test/runtests.jl:368 [inlined]
       [16] (::Main.Test42Main_Downloads.var"#24#50"{Downloads.Downloader, String, Int32})()
          @ Main.Test42Main_Downloads ./task.jl:501

Example log: https://buildkite.com/julialang/julia-master/builds/14198#01822720-ae76-4ff4-8a35-cec678dfa6a7

DilumAluthge avatar Jul 25 '22 01:07 DilumAluthge

https://github.com/JuliaCI/julia-buildkite/pull/185 may fix this, so would be good to check

vtjnash avatar Jul 25 '22 01:07 vtjnash

This Win64 job failed with

      nested task error: RequestError: Failed to connect to httpbingo.julialang.org port 443 after 21058 ms: Timed out while requesting https://httpbingo.julialang.org/delay/2?id=40

which looks similar enough?

giordano avatar Jul 25 '22 19:07 giordano

Sadly, I am still seeing the original failure (nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=98) on i686-linux-gnu even after I merged https://github.com/JuliaCI/julia-buildkite/pull/185.

Example log: https://buildkite.com/julialang/julia-master/builds/14331#01823cdc-7767-4c4b-bcc8-976db68868de

In the above log, it says that JULIA_NUM_THREADS was set to 1, and that the value of nthreads() was 1, so it does look like threads were disabled in that job.

DilumAluthge avatar Jul 27 '22 05:07 DilumAluthge

This Win64 job failed with

      nested task error: RequestError: Failed to connect to httpbingo.julialang.org port 443 after 21058 ms: Timed out while requesting https://httpbingo.julialang.org/delay/2?id=40

which looks similar enough?

Hmmm. I don't think that's the same failure.

DilumAluthge avatar Jul 27 '22 07:07 DilumAluthge

I did see the exact same failure as the original post (nested task error: RequestError: getaddrinfo() thread failed to start while requesting https://httpbingo.julialang.org/delay/2?id=1) just now on Buildbot on i686-w64-mingw32.

Log: https://build.julialang.org/#/builders/42/builds/7253

DilumAluthge avatar Jul 27 '22 08:07 DilumAluthge

I should point out that even though this failure didn't go away once I merged https://github.com/JuliaCI/julia-buildkite/pull/185, it certainly did become less frequent.

DilumAluthge avatar Jul 27 '22 08:07 DilumAluthge

I continue to see this on i686-linux-gnu.

@vtjnash @StefanKarpinski Could this be a libuv issue? We are using libuv's getaddrinfo, right?

DilumAluthge avatar Aug 05 '22 05:08 DilumAluthge

Looking closer, the test fails at 20-30 seconds, but the expected maximum time is 180 seconds: https://github.com/JuliaLang/Downloads.jl/issues/204

Thus, we are simply running too many downloads concurrently, and mbedtls is too slow to keep up.

vtjnash avatar Sep 28 '22 16:09 vtjnash

Interestingly, same error as I saw here: https://github.com/JuliaLang/Downloads.jl/issues/196#issuecomment-1255391151. Seems like there's some race condition in libcurl that we're hitting in some cases. In the link, putting a try/catch in a cfunction someone causes it to be triggered more easily.

StefanKarpinski avatar Sep 28 '22 16:09 StefanKarpinski

Ah, sorry, I was responding to https://github.com/JuliaLang/julia/issues/46159#issuecomment-1194557004, which is a different issue (that happens to be in roughly the same place, for related but not identical reasons)

vtjnash avatar Sep 28 '22 17:09 vtjnash

This semaphore may help: https://github.com/JuliaLang/Downloads.jl/pull/205.

StefanKarpinski avatar Oct 01 '22 21:10 StefanKarpinski

You can test this on your own machine (both issues here should turn out to have the same fix) by changing the 100 concurrent attempts to 1500 (or even higher) in this test

vtjnash avatar Oct 03 '22 00:10 vtjnash