Genie.jl icon indicating copy to clipboard operation
Genie.jl copied to clipboard

Genie.Logger.initialize_logging causes slow Arrow writes

Open poulter7 opened this issue 1 year ago • 5 comments

Describe the bug I observe that after calling Genie.Logger.initialize_logging() invocations to Arrow.write() are much slower.

To reproduce Source script:

# slow.jl
using Pkg
Pkg.add("Arrow")
Pkg.add("Genie")
Pkg.add("DataFrames")
using Arrow
using Genie
using DataFrames
if length(ARGS) > 0
    Genie.Logger.initialize_logging()
end
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))

First invocation of the script:

julia ./src/demo/slow.jl
  4.969270 seconds (5.48 M allocations: 461.108 MiB, 3.13% gc time, 95.31% compilation time: 3% of which was recompilation)
  0.213863 seconds (89.38 k allocations: 107.136 MiB, 4.35% gc time)
  0.212987 seconds (89.38 k allocations: 107.136 MiB, 4.44% gc time)

Second invocation of the script:

julia ./src/demo/slow.jl 1
 17.421436 seconds (49.03 M allocations: 6.738 GiB, 3.36% gc time, 34.72% compilation time: 2% of which was recompilation)
 11.818701 seconds (42.00 M allocations: 6.284 GiB, 3.92% gc time)
 12.145096 seconds (42.00 M allocations: 6.275 GiB, 4.37% gc time)

Expected behavior I'd not expect any performance effects on Arrow performance.

Additional context

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 64 virtual cores
Environment:
  JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
  JULIA_VERSION = 1.9.3
  JULIA_PATH = /usr/local/julia

Package version:

pkg> st 
  [69666777] Arrow v2.6.2
  [a93c6f00] DataFrames v1.6.1
  [c43c736e] Genie v5.18.2

poulter7 avatar Sep 14 '23 21:09 poulter7

If I remove global_logger from the chain: LoggingExtras.TeeLogger(LoggingExtras.MinLevelLogger(logger, Genie.config.log_level)) |> timestamp_logger |> global_logger

In: /root/.julia/packages/Genie/.../src/Logger.jl I observe it speeds up again.

poulter7 avatar Sep 14 '23 21:09 poulter7

@poulter7 Wow, thanks, I'll take a look, very interesting.

essenciary avatar Sep 15 '23 07:09 essenciary

The problem is that the created Logger always applies the timestamp_logger transformation even for debug messages. Arrow prints a lot of debug messages and is thus very slow.

@poulter7 can you check if the pr #693 fixes your issue.

antholzer avatar Dec 06 '23 09:12 antholzer

I've merged #693 - let me know if all good and I'll tag a release.

essenciary avatar Dec 06 '23 14:12 essenciary

It's being released now

essenciary avatar Dec 06 '23 17:12 essenciary