RemoteREPL.jl icon indicating copy to clipboard operation
RemoteREPL.jl copied to clipboard

Significant overhead/latency (about 50ms)

Open KronosTheLate opened this issue 1 year ago • 5 comments

I mentioned in a comment on this issue that I had some latency issues when using RemoteREPL for my Raspberry Pi. But I just checked using a local host, so no SSH, and having everything running on the same, modern computer. I found that there is STILL almost 50 ms of latency from just evaluating 1 and returning the result:

julia> @btime @remote 1
  43.513 ms (66 allocations: 3.61 KiB)
1

By running using ProfileView and then @profview @remote 1, I get the following flamegraph: image

From the top, the call-sites that make up the flamegraph are

./task.jl:795, MethodInstance for poptask(::Base.InvasiveLinkedListSynchronized{Task})
./task.jl:804, MethodInstance for wait()
./condition.jl:106, MethodInstance for wait(::Base.GenericCondition{Base.Threads.SpinLock})
./stream.jl:413, MethodInstance for wait_readnb(::Sockets.TCPSocket, ::Int64)
./stream.jl:106, eof [inlined]
./stream.jl:925, MethodInstance for read(::Sockets.TCPSocket, ::Type{UInt8})
/buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:782, deserialize [inlined]
/buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:769, MethodInstance for deserialize(::Sockets.TCPSocket)
/home/dennishb/.julia/packages/RemoteREPL/BFqrB/src/client.jl:207, MethodInstance for var"#send_and_receive#40"(::Bool, ::typeof(RemoteREPL.send_and_receive), ::RemoteREPL.Connection, ::Tuple{Symbol, Int64})
/home/dennishb/.julia/packages/RemoteREPL/BFqrB/src/client.jl:199, send_and_receive [inlined]
/home/dennishb/.julia/packages/RemoteREPL/BFqrB/src/client.jl:382, MethodInstance for (::RemoteREPL.var"#47#48"{RemoteREPL.Connection, Int64})()
/home/dennishb/.julia/packages/RemoteREPL/BFqrB/src/client.jl:178, MethodInstance for var"#ensure_connected!#39"(::Int64, ::typeof(RemoteREPL.ensure_connected!), ::RemoteREPL.var"#47#48"{RemoteREPL.Connection, Int64}, ::RemoteREPL.Connection)
/home/dennishb/.julia/packages/RemoteREPL/BFqrB/src/client.jl:174, ensure_connected! [inlined]
/home/dennishb/.julia/packages/RemoteREPL/BFqrB/src/client.jl:380, MethodInstance for remote_eval_and_fetch(::RemoteREPL.Connection, ::Int64)
./boot.jl:360, eval [inlined]

I am not sure if this can be improved, or if this wait-time is necessary when dealing with networks. But investigations should be made into the possibility of avoiding this ~50 ms latency to every remote call.

KronosTheLate avatar Jun 21 '23 13:06 KronosTheLate

https://github.com/JuliaLang/julia/issues/31842 ?

xgdgsc avatar Jun 22 '23 07:06 xgdgsc

Naively 50 ms seems pretty crazy high on the loopback interface?

I expect this is more a Julia issue than a problem in this package but if we can invent a workaround that's great. Thanks @xgdgsc for the link :-)

c42f avatar Jun 23 '23 05:06 c42f

The linked issue has a comment where Jeff says that the culprit is the "Nagle algorithm". It can be disabled:

help?> Sockets.nagle
  nagle(socket::Union{TCPServer, TCPSocket}, enable::Bool)


  Enables or disables Nagle's algorithm on a given TCP server or socket.

  │ Julia 1.3
  │
  │  This function requires Julia 1.3 or later.

Should we use Sockets.nagle to disable this algorithm by default? I have to imagine that generally we do not want a 50 ms delay, for the gain of fewer packets on a communication channel that is not used by multiple people.

KronosTheLate avatar Oct 24 '23 12:10 KronosTheLate

Correct, you should not be using Nagle's algorithm for interactive sockets - it's intended for high-bandwidth, high-latency TCP connections (such as data downloads).

jpsamaroo avatar Oct 24 '23 13:10 jpsamaroo

PR created. The effect was a 74x reduction in overhead, from adding a single line!

KronosTheLate avatar Oct 24 '23 14:10 KronosTheLate