dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

is DD_PROFILING_ALLOC_EVENT_BUFFER to small?

Open zdyj3170101136 opened this issue 1 year ago • 6 comments

Summary of problem

is DD_PROFILING_ALLOC_EVENT_BUFFER to small?

Which version of dd-trace-py are you using?

i use datadog to collect allocation sample.

i find the output is very different when i set DD_PROFILING_ALLOC_EVENT_BUFFER to 10000.

Which libraries and their versions are you using?

ddtrace-run -v ddtrace-run 2.5.0

attrs==22.2.0
bytecode==0.13.0
cattrs==1.0.0
chardet==3.0.4
cloudpickle==2.2.1
commonmark==0.9.1
conda==4.6.14
dataclasses==0.8
ddsketch==2.0.4
ddtrace==1.20.16
distro==1.5.0
envier==0.4.0
idna==2.10
importlib-metadata==4.8.3
ipaddress==1.0.23
numpy==1.19.5
pep562==1.1
Pillow==8.4.0
protobuf==3.19.6
pycosat==0.6.3
pycrypto==2.6.1
Pygments==2.14.0
PySocks==1.6.8
PyYAML==3.13
requests==2.14.2
rich==12.6.0
ruamel.yaml==0.13.14
six==1.16.0
typing_extensions==4.1.1
urllib3==1.25.6
xmltodict==0.13.0
zipp==3.6.0

How can we reproduce your problem?

code:

#from ddtrace.profiling import Profiler

#prof = Profiler(
#        url="http://127.0.0.1:8081",
  #enable_code_provenance=True, # provide library version
#)
#prof.start() # Should be as early as possible, eg before other imports, to ensure everything is profiled
import os
import PIL
from PIL import Image
import time
dir = "/home/centos/pillow-bench/Images_0/0"
onlyfiles = []

filedir = dir
files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
filepaths = map( lambda x: os.path.join(filedir, x), files )
onlyfiles += list(filepaths)

images = []
s = 0
for i in range(len(onlyfiles)):
    fname = onlyfiles[i]
    image = PIL.Image.open(fname)
    # Do something
    images.append(image)
    image.tobytes("xbm", "rgb")
    if i / 10000 == 1:
        print('iteration ', i)
        break
#time.sleep(30
#input("Press Enter to continue...")

first run:

 DD_PROFILING_ENABLED=true  DD_PROFILING_STACK_ENABLED=false DD_PROFILING_LOCK_ENABLED=false  DD_PROFILING_CAPTURE_PCT=100 DD_PROFILING_OUTPUT_PPROF=h.pprof    DD_PROFILING_MEMORY_EVENTS_BUFFER=16 ddtrace-run python3 benchmakr.py

second run:

DD_PROFILING_ENABLED=true  DD_PROFILING_STACK_ENABLED=false DD_PROFILING_LOCK_ENABLED=false  DD_PROFILING_CAPTURE_PCT=100 DD_PROFILING_OUTPUT_PPROF=h.pprof    DD_PROFILING_MEMORY_EVENTS_BUFFER=16 ddtrace-run python3 benchmakr.py

What is the result that you get?

when buffer is 16: alloc-space 截屏2024-01-31 上午10 05 30 alloc-samples 截屏2024-01-31 上午10 05 43 heap-space 截屏2024-01-31 上午10 07 04

when buffer is 10000: alloc-space 截屏2024-01-31 上午10 06 02 alloc-samples 截屏2024-01-31 上午10 06 13 heap-space 截屏2024-01-31 上午10 07 17

What is the result that you expected?

i noticed that the allocation sample would cleanup each interval(default 0.5s), if the allocation sample is larger than it, it would be random sampled.

the allocation space would be not accuracy with small allocation buffer.

zdyj3170101136 avatar Jan 31 '24 02:01 zdyj3170101136

when i read the code. i find the calculation of sample event and sample space is different. 截屏2024-01-31 上午10 48 44

截屏2024-01-31 上午10 52 38

with small event buffer, we have small capture_pct.

finally.

sample allocation size equals event.size * alloccount / count, but the alloccount is same. so if count is small, the larger heap size. if each event.size is same, then the final heap size is correct. but if sample event size is large, then the final heap size would be too larger.

the sample allocation sample equals with count. so with larger buffer, it would be larger.

zdyj3170101136 avatar Jan 31 '24 03:01 zdyj3170101136

the profiling default recorder max events is 16384.

so the alloc_buffer_size / 0.5 * 60s = 16384. the alloc_buffer_size would be safely set to 128.

zdyj3170101136 avatar Jan 31 '24 03:01 zdyj3170101136

i also noticed that all event in same allocation array, even they have different stacktrace.

they have the same count, this would make the finally result count x of the really output.

zdyj3170101136 avatar Jan 31 '24 07:01 zdyj3170101136

使用 tracemalloc 获取的实际内存分配次数以及 bytes:

python3 benchmakr.py
iteration  10000
[ Top 10 ]
/root/miniconda3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:273: size=12.3 MiB, count=49959, average=259 B
<frozen posixpath>:88: size=10.0 MiB, count=108131, average=97 B
/home/centos/pillow-bench/benchmakr.py:20: size=7241 KiB, count=108132, average=69 B
/root/miniconda3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:840: size=2969 KiB, count=20002, average=152 B
/root/miniconda3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:250: size=2419 KiB, count=39957, average=62 B
/root/miniconda3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:77: size=1171 KiB, count=9992, average=120 B
/root/miniconda3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:72: size=1168 KiB, count=9963, average=120 B
/root/miniconda3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:73: size=859 KiB, count=20002, average=44 B
/home/centos/pillow-bench/benchmakr.py:22: size=845 KiB, count=1, average=845 KiB
<frozen importlib._bootstrap_external>:729: size=699 KiB, count=6420, average=111 B
Total allocation count is 537063

code:

#from ddtrace.profiling import Profiler

#prof = Profiler(
#        url="http://127.0.0.1:8081",
  #enable_code_provenance=True, # provide library version
#)
#prof.start() # Should be as early as possible, eg before other imports, to ensure everything is profiled
import os
import PIL
from PIL import Image
import time
dir = "/home/centos/pillow-bench/Images_0/0"
onlyfiles = []

import tracemalloc

tracemalloc.start()

filedir = dir
files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
filepaths = map( lambda x: os.path.join(filedir, x), files )
onlyfiles += list(filepaths)

images = []
s = 0
for i in range(len(onlyfiles)):
    fname = onlyfiles[i]
    #print(fname)
    image = PIL.Image.open(fname)
    # Do something
    images.append(image)
    image.tobytes("xbm", "rgb")
    if i / 10000 == 1:
        print('iteration ', i)
        break
#time.sleep(30
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)
total_count = sum(stat.count for stat in top_stats)
print(f"Total allocation count is {total_count}")

zdyj3170101136 avatar Jan 31 '24 07:01 zdyj3170101136

Thanks for the report, @zdyj3170101136. Do you think it's possible that your application requires a larger buffer than the default? Maybe there's some documentation we could change to clarify this point?

cc @sanchda

emmettbutler avatar Jan 31 '24 14:01 emmettbutler

感谢您的报告,@zdyj3170101136。您认为您的应用程序可能需要比默认值更大的缓冲区吗?也许我们可以更改一些文档来澄清这一点?

抄送@sanchda

sure, maybe the event_buffer dynamic set from reporter buffer and flush interval.

zdyj3170101136 avatar Feb 02 '24 05:02 zdyj3170101136

This issue has been automatically closed after a period of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.

github-actions[bot] avatar May 08 '24 00:05 github-actions[bot]