envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Unable to get heap profile

Open gu0keno0 opened this issue 9 months ago • 5 comments

Hi, I've been trying to get Envoy heap profile work, but with no luck so far. I've tested the followings:

  1. Using admin endpoint /heap_dump I tried this for 3 binaries: our in-house built Envoy, envoyproxy/envoy:v1.33-latest container and istio/proxyv2:1.25.1-debug . Yet the dump is just ~2KB in size and does not contain much information:
root@2dc295247198:/# ls envoy.heap -l
-rw-r--r-- 1 root root 2011 Mar 27 13:28 envoy.heap
root@2dc295247198:/# curl http://localhost:9901/memory
{
 "allocated": "32911672",
 "heap_size": "54525952",
 "pageheap_unmapped": "0",
 "pageheap_free": "4161536",
 "total_thread_cache": "15860608",
 "total_physical_bytes": "60297694"
}
root@2dc295247198:/# curl http://localhost:9901/heap_dump -o envoy.heap
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2094    0  2094    0     0   2384      0 --:--:-- --:--:-- --:--:--  2382
root@2dc295247198:/# ls -l envoy.heap
-rw-r--r-- 1 root root 2094 Mar 27 13:29 envoy.heap
root@2dc295247198:/# go tool pprof /usr/local/bin/envoy envoy.heap
File: envoy
Type: space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) text
Showing nodes accounting for 20.94MB, 100% of 20.94MB total
      flat  flat%   sum%        cum   cum%
   20.94MB   100%   100%    20.94MB   100%  [envoy]
         0     0%   100%    20.94MB   100%  [libc.so.6]
(pprof)
  1. Use gperftools and tcmalloc I tried to build our in-house version with gperftools and tcmalloc: https://github.com/envoyproxy/envoy/blob/main/bazel/PPROF.md#collecting-the-profile , the build command is (tried some combinations of build options, from https://github.com/envoyproxy/envoy/pull/21160):
CC=clang CXX=clang++ /usr/local/bin/bazel build -c dbg --copt=-g --strip=never --linkopt=-Wl,--no-rosegment --extra_toolchains=@local_jdk//:all --cxxopt -D_GLIBCXX_USE_CXX11_ABI=1 --cxxopt -DENVOY_IGNORE_GLIBCXX_USE_CXX11_ABI_ERROR=1 --define tcmalloc=gperftools envoy

Launched Envoy with gperftools env variable:

HEAPPROFILE=/tmp/envoy.heap HEAPPROFILESIGNAL=12 envoy-static -c ~/envoy-min.yaml --concurrency 2 2>&1

I was able to trigger dumps in this way, but seems like pprof has trouble in locating symbols:

coder [ ~ ]$ env | grep 'PPROF_BINARY_PATH'
PPROF_BINARY_PATH=/home/coder/envoy-build/.bazel_envoy_cache/coder/da311d67ca475f55784fc7b1dd8a320c/execroot/envoy/bazel-out/k8-dbg/bin/source/exe/
coder [ ~ ]$ ls -l /tmp/envoy.heap.0057.heap
-rw-rw-r-- 1 coder coder 1048564 Mar 27 01:40 /tmp/envoy.heap.0057.heap
coder [ ~ ]$ go tool pprof -nodefraction=0  -nodecount=99999 /home/coder/envoy-build/.bazel_envoy_cache/coder/da311d67ca475f55784fc7b1dd8a320c/execroot/envoy/bazel-out/k8-dbg/bin/source/exe/envoy-static  /tmp/envoy.heap.0057.heap
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
File: envoy-static
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) text

         0     0%   100%     0.70kB 0.0063%  DeleteHook
         0     0%   100%     0.87kB 0.0078%  DoAllocWithArena
         0     0%   100%    13.44kB  0.12%  MaybeDumpProfileLocked
         0     0%   100%     1.11kB  0.01%  NewHook
         0     0%   100%     0.31kB 0.0028%  RecordAlloc
         0     0%   100%     4.54kB 0.041%  __copy_move_a2
         0     0%   100%        4kB 0.036%  __equal_aux1
         0     0%   100%     1.28kB 0.012%  __memcmp
         0     0%   100%     1.40kB 0.013%  allocate_full_cpp_throw_oom
         0     0%   100%     1.27kB 0.011%  capture
         0     0%   100%     4.90kB 0.044%  copy
         0     0%   100%       12kB  0.11%  epoll_dispatch
         0     0%   100%     1.50kB 0.014%  epoll_init
         0     0%   100%    23.75kB  0.21%  event_add
         0     0%   100%    24.75kB  0.22%  event_add_nolock_
         0     0%   100%  4255.92kB 38.35%  event_base_loop
         0     0%   100%     5.86kB 0.053%  event_base_new
         0     0%   100%     5.86kB 0.053%  event_base_new_with_config
         0     0%   100%  4337.35kB 39.09%  event_persist_closure
         0     0%   100%  4290.53kB 38.66%  event_process_active
         0     0%   100%  4299.90kB 38.75%  event_process_active_single_queue
         0     0%   100%    24.50kB  0.22%  evmap_io_add_
         0     0%   100%     9.25kB 0.083%  evmap_make_space
         0     0%   100%     0.25kB 0.0023%  evmap_signal_add_
         0     0%   100%        1kB 0.009%  evthread_make_base_notifiable
         0     0%   100%        1kB 0.009%  evthread_make_base_notifiable_nolock_
         0     0%   100%     2.19kB  0.02%  invoke_hooks_and_free
(pprof)

This is slightly better than the first one, but still, important functions in Envoy's http stack is not shown. Seems like a symbol issue because only libevent functions are parsed correctly.

In these tests, I'm running ab at the background to produce load, with the following minimal envoy config:

admin:
  access_log_path: /dev/null
  address:
    socket_address:
      address: 127.0.0.1
      port_value: 9901

static_resources: {}

Also tried to add listener and upstream to stress more code paths, the heap profile results are the same:

cat /tmp/envoy-more.yaml

                      domains: ["*"]
                      routes:
                        - match:
                            path: "/config_dump"
                          route:
                            cluster: admin_cluster
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
                http_protocol_options:
                  accept_http_10: true
  clusters:
    - name: admin_cluster
      connect_timeout: 0.25s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      load_assignment:
        cluster_name: admin_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 9901

Let me know if I'm missing something, we'd like to enable heap profiling in our production as well.

gu0keno0 avatar Mar 27 '25 14:03 gu0keno0

I will try to take a look this weekend.

wbpcode avatar Mar 31 '25 14:03 wbpcode

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 30 '25 16:04 github-actions[bot]

Bump.

gu0keno0 avatar May 05 '25 18:05 gu0keno0

@wbpcode @gu0keno0 we've been wrestling with the same problem and haven't found a solution. Any updates?

irlevesque avatar May 29 '25 20:05 irlevesque

Hi, I get some free time this weekend and have taken a look. Seems everything works fine. I rebuild one in my local dev container and bootstrap envoy with simple demo yaml.

Are you sure your binary contains the symbols or is the unstripped version?

# pprof ./bazel-bin/source/exe/envoy-static /tmp/envoy.heap
File: envoy-static
Build ID: e0e8c299558362ca0a9869b89f689f9f53952461
Type: space
Time: 2025-06-15 14:48:15 UTC
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) text
Showing nodes accounting for 6.10MB, 100% of 6.10MB total
Showing top 10 nodes out of 31
      flat  flat%   sum%        cum   cum%
    6.10MB   100%   100%     6.10MB   100%  std::__1::basic_string::__append_default_init[abi:ne180100]
         0     0%   100%     1.97MB 32.31%  Envoy::Config::TypedFactory::configTypes
         0     0%   100%     6.10MB   100%  Envoy::MainCommon::MainCommon
         0     0%   100%     6.10MB   100%  Envoy::MainCommon::main
         0     0%   100%     6.10MB   100%  Envoy::MainCommonBase::MainCommonBase
         0     0%   100%     4.13MB 67.69%  Envoy::ProcessWide::ProcessWide
         0     0%   100%     1.97MB 32.31%  Envoy::Registry::FactoryRegistry::buildFactoriesByType
         0     0%   100%     1.97MB 32.31%  Envoy::Registry::FactoryRegistry::registeredTypes
         0     0%   100%     1.97MB 32.31%  Envoy::Registry::FactoryRegistryProxyImpl::registeredTypes
         0     0%   100%     1.97MB 32.31%  Envoy::Server::InstanceBase::initialize
(pprof)  

wbpcode avatar Jun 15 '25 15:06 wbpcode

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 15 '25 16:07 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Jul 22 '25 20:07 github-actions[bot]