bytehound
bytehound copied to clipboard
Long running server
Hi. I want to use bytehound to profile a long running cloud server.
- I'm starting the application with the following environment:
MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS=1 \
MEMORY_PROFILER_TEMPORARY_ALLOCATION_LIFETIME_THRESHOLD=60000 \
MEMORY_PROFILER_OUTPUT=/mnt/data/memprof_%e_%t_%p_%n.mem \
MEMORY_PROFILER_DISABLE_BY_DEFAULT=1 \
MEMORY_PROFILER_LOG=info \
MEMORY_PROFILER_LOGFILE=/mnt/data/bytehound_%e_%p.log \
LD_PRELOAD=/usr/local/lib/libbytehound.so \
/path/to/nemo
- After a moment I enable profiling with
pkill --signal SIGUSR1 app
- I wait 10 seconds
- I disable profiling
pkill --signal SIGUSR1 app
Logged events shows that's what is happening, however the created profile /mnt/data/memprof_nemo_1644512060_7_0.mem
continues to grow, long after having disabled profiling (15 minutes after).
What am I doing wrong?
I was under the impression I could start/stop profiling multiple times, at different moments of my application execution, and have a different profile created for each of them.
bytehound_nemo_7.log content:
0007 0007 INF Version: 0.8.0
0007 0007 INF Options:
0007 0007 INF MEMORY_PROFILER_BASE_SERVER_PORT = 8100
0007 0007 INF MEMORY_PROFILER_CHOWN_OUTPUT_TO = None
0007 0007 INF MEMORY_PROFILER_DISABLE_BY_DEFAULT = true
0007 0007 INF MEMORY_PROFILER_ENABLE_BROADCAST = false
0007 0007 INF MEMORY_PROFILER_ENABLE_SERVER = false
0007 0007 INF MEMORY_PROFILER_GRAB_BACKTRACES_ON_FREE = false
0007 0007 INF MEMORY_PROFILER_INCLUDE_FILE = None
0007 0007 INF MEMORY_PROFILER_OUTPUT = "/mnt/data/memprof_%e_%t_%p_%n.mem"
0007 0007 INF MEMORY_PROFILER_REGISTER_SIGUSR1 = true
0007 0007 INF MEMORY_PROFILER_REGISTER_SIGUSR2 = true
0007 0007 INF MEMORY_PROFILER_USE_PERF_EVENT_OPEN = true
0007 0007 INF MEMORY_PROFILER_USE_SHADOW_STACK = true
0007 0007 INF MEMORY_PROFILER_WRITE_BINARIES_TO_OUTPUT = true
0007 0007 INF MEMORY_PROFILER_ZERO_MEMORY = false
0007 0007 INF MEMORY_PROFILER_GATHER_MMAP_CALLS = false
0007 0007 INF MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_1 = 16384
0007 0007 INF MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_2 = 327680
0007 0007 INF MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS = true
0007 0007 INF MEMORY_PROFILER_TEMPORARY_ALLOCATION_LIFETIME_THRESHOLD = 60000
0007 0007 INF MEMORY_PROFILER_TEMPORARY_ALLOCATION_PENDING_THRESHOLD = None
0007 0007 INF Setting atexit hook...
0007 0007 INF Registering SIGUSR1 handler...
0007 0007 INF Registering SIGUSR2 handler...
0007 0007 INF Startup initialization finished
0007 0007 INF Signal handler triggered with signal: SIGUSR1 (10)
0007 0007 INF Tracing will be toggled ON (for the first time)
0007 0018 INF Spawning event processing thread...
0007 0130 INF Starting event thread...
0007 0130 INF Data ID: b3e2da0ddf32c424bd7414740d025a80
0007 0130 INF File '/mnt/data/memprof_nemo_1644512060_7_0.mem' opened for writing
0007 0130 INF Writing initial header...
0007 0130 INF Writing wall clock...
0007 0130 INF Writing uptime...
0007 0130 INF Writing environ...
0007 0130 INF Writing maps...
0007 0130 INF Writing binaries...
0007 0018 INF Couldn't find jemalloc in the executable's address space
0007 0018 INF Tracing was enabled
0007 0130 INF Flushing...
0007 0007 INF Signal handler triggered with signal: SIGUSR1 (10)
0007 0007 INF Tracing will be toggled OFF
0007 0026 INF Reloading address space
0007 0026 INF Reloading address space
0007 0026 INF Reloading address space
0007 0026 INF Reloading address space
0007 000a INF Reloading address space
0007 0023 INF Reloading address space
0007 0020 INF Reloading address space
0007 0020 INF Reloading address space
0007 0012 INF Reloading address space
0007 0007 INF Signal handler triggered with signal: SIGUSR1 (10)
0007 0007 INF Tracing will be toggled ON
0007 0007 INF Signal handler triggered with signal: SIGUSR1 (10)
0007 0007 INF Tracing will be toggled OFF
This is expected. Previously starting/stopping profiling would create a new file, but that's problematic for several reasons. The current behavior is that there's always only one file, and if you stop the profiling the data's not gathered anymore for new allocations, but any allocation which was made before you stopped profiling will still be tracked.
In general if you set a high enough MEMORY_PROFILER_TEMPORARY_ALLOCATION_LIFETIME_THRESHOLD
you should be able to profile a process over a long time without having to start/stop anything.
Okay thanks. I've just tried with MEMORY_PROFILER_TEMPORARY_ALLOCATION_LIFETIME_THRESHOLD=600000
(10minutes) rather than the 1minute I had previously configured, however I'm now seeing a bunch of these lines in the log and the
0007 0052 ERR Failed to fully unwind; some of the stack traces will be incomplete. (Are you using a JIT compiler?)
I was not seeing that previously
That should be unrelated, I think.
The program you're trying to profile might be doing something weird. What's it written in?
It's written in rust. With bindings to Rocksdb (c++) and postgres (c) though
Strangely the program i'm trying to profile always get the same messages as posted above:
0007 0052 ERR Failed to fully unwind; some of the stack traces will be incomplete. (Are you using a JIT compiler?)
This is a rust program built with frame pointer, debug-info and I let optimization level in cargo to 3 since i was able to obtain useful traces before. Could that come from my configuration?
If that can help, each time I profile the application with profiling enabled, it starts with a long hanging period (looks like a lock/deadlock to me) before resuming normal operations, hanging again, etc. The profile I get is usable and seems correct. Could this be related to the error message I'm getting (about the unwound stacks). @koute
When that happens, this is what perf top
says:
Samples: 595K of event 'cycles', Event count (approx.): 336310883184
Overhead Shared Object Symbol
23,63% libbytehound.so [.] bytehound::allocation_tracker::on_allocation
20,53% libbytehound.so [.] hashbrown::map::HashMap<K,V,S,A>::get_mut
6,51% libbytehound.so [.] bytehound::unwind::grab
5,16% libbytehound.so [.] bytehound::ordered_map::OrderedMap<K,V,S>::remove
5,09% libbytehound.so [.] <nwind::arch::amd64::Arch as nwind::arch::Architecture>::unwind
4,11% libbytehound.so [.] parking_lot::raw_rwlock::RawRwLock::lock_shared_slow
3,11% libbytehound.so [.] bytehound::allocation_tracker::flush_pending
2,72% REDACTED [.] lz4_flex::block::compress::compress_internal
2,50% libbytehound.so [.] bytehound::allocation_tracker::on_free
The apparent hanging might be related to the fact that your executable is dumped, compressed and written into the output file at the start (so that later during analysis the stack traces can be decoded), and if you're building you application with full debug info and the file's big then that might take some time.
Nevertheless, that should be unrelated to the error, for which there could be many potential reasons. Ideally if you could maybe create a minimal reproduction where this error's still getting printed then maybe we could diagnose it. Alternatively you could recompile bytehound with debug logs support, turn on trace
logs and paste those in, and that could shed a hint as to what happens, but those logs will most certainly be huge, hard to analyze and might contain potentially sensitive information which you might not want to share, so a minimal reproduction would be ideal.
The apparent hanging might be related to the fact that your executable is dumped, compressed and written into the output file at the start (so that later during analysis the stack traces can be decoded), and if you're building you application with full debug info and the file's big then that might take some time.
Ok, but the application doesn't only hangs at the start. It hangs, then resumes, then hangs again, etc, definitely. until i send an SIGUSR1
signal to stop profiling, then it immediately resumes normal operations and never hangs again. So from what I can tell it seems it's not just the dumping and compressing of the binary in the output file.
I'll see if I manage to create a small reproducer for you to look at In any case thank you for your help.
Ok, but the application doesn't only hangs at the start. It hangs, then resumes, then hangs again, etc, definitely. until i send an
SIGUSR1
signal to stop profiling,
Okay, that shouldn't happen. Sorry, this might be a stupid question, but you aren't compiling bytehound in debug mode, are you? (:
Also, if you could answer those:
- How many allocations/sec is your application making, approximately?
- On how many threads concurrently is it actively running?
Sorry, this might be a stupid question, but you aren't compiling bytehound in debug mode, are you? (:
No, i'm using the binary release from github directly
- i don't know how many allocs/sec but we're speaking about a cloud application using up to 400GB memory.
- running on a 48 threads machine, all actively used.