Dagger.jl
Dagger.jl copied to clipboard
Logging is slow
In a single multithreaded process, logging with MultiEventLog brings massive overheads. For 8 threads and 40 graphs ~800 vertices each we observe ~7x application slow down and ~26x increase of lock conflicts. With the overhead this big logging isn't very useful as it considerably alters the application (way more time is spent on logging than on running the actual application)
Without logging
Pipeline (throughput 1.00 events/s): 40.097314 seconds (62.55 M allocations: 3.252 GiB, 3.20% gc time, 128376 lock conflicts, 0.47% compilation time)
With logging
Dagger.enable_logging!(tasknames = true,
taskfuncnames = true,
taskdeps = true,
taskargs = true,
taskargmoves = true,
taskresult = true,
taskuidtotid = true,
tasktochunk = true)
Pipeline (throughput 0.14 events/s): 290.967248 seconds (233.25 M allocations: 19.686 GiB, 4.86% gc time, 3346075 lock conflicts, 0.00% compilation time)
Can you provide an MWE? Also, please try with metrics=false and let me know if this helps significantly - I probably need to just disable that one by default, since it makes OS calls that can be expensive.
Thanks, indeed metrics=false seems to help significantly. Now it's down to 40% slow down, which is still a bit high
Pipeline (throughput 0.71 events/s): 56.636791 seconds (106.55 M allocations: 8.984 GiB, 7.98% gc time, 1045838 lock conflicts, 0.00% compilation time)
I'll try to return with a MWE later
#626 has disabled metrics and enabled all_task_deps by default.
Still, it would be worth investigating some improvements to logging; in particular, doing some work to improve type stability, and trying to avoid the context-global log lock, would probably help significantly.
For timeline profiling (:plots_gantt, :chrome_trace) we switched to NVTX.jl, the overheads seems to be <1%. But still Dagger logging is very helpful for other things as it contains a lot information that won't be accessible otherwise like dependencies between scheduled tasks