Genie.jl
Genie.jl copied to clipboard
Genie.Logger.initialize_logging causes slow Arrow writes
Describe the bug
I observe that after calling Genie.Logger.initialize_logging()
invocations to Arrow.write()
are much slower.
To reproduce Source script:
# slow.jl
using Pkg
Pkg.add("Arrow")
Pkg.add("Genie")
Pkg.add("DataFrames")
using Arrow
using Genie
using DataFrames
if length(ARGS) > 0
Genie.Logger.initialize_logging()
end
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))
First invocation of the script:
julia ./src/demo/slow.jl
4.969270 seconds (5.48 M allocations: 461.108 MiB, 3.13% gc time, 95.31% compilation time: 3% of which was recompilation)
0.213863 seconds (89.38 k allocations: 107.136 MiB, 4.35% gc time)
0.212987 seconds (89.38 k allocations: 107.136 MiB, 4.44% gc time)
Second invocation of the script:
julia ./src/demo/slow.jl 1
17.421436 seconds (49.03 M allocations: 6.738 GiB, 3.36% gc time, 34.72% compilation time: 2% of which was recompilation)
11.818701 seconds (42.00 M allocations: 6.284 GiB, 3.92% gc time)
12.145096 seconds (42.00 M allocations: 6.275 GiB, 4.37% gc time)
Expected behavior I'd not expect any performance effects on Arrow performance.
Additional context
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
Threads: 1 on 64 virtual cores
Environment:
JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
JULIA_VERSION = 1.9.3
JULIA_PATH = /usr/local/julia
Package version:
pkg> st
[69666777] Arrow v2.6.2
[a93c6f00] DataFrames v1.6.1
[c43c736e] Genie v5.18.2
If I remove global_logger
from the chain:
LoggingExtras.TeeLogger(LoggingExtras.MinLevelLogger(logger, Genie.config.log_level)) |> timestamp_logger |> global_logger
In: /root/.julia/packages/Genie/.../src/Logger.jl I observe it speeds up again.
@poulter7 Wow, thanks, I'll take a look, very interesting.
The problem is that the created Logger always applies the timestamp_logger
transformation even for debug messages. Arrow
prints a lot of debug messages and is thus very slow.
@poulter7 can you check if the pr #693 fixes your issue.
I've merged #693 - let me know if all good and I'll tag a release.
It's being released now