memray Duplicate entries in `memray table --leaks` reporter.

Current Behavior

I am testing Memray's capabilities to identify memory leaks in python programs that use C extensions. I have a program that uses a library, which has a known memory leak coming from an extension code, and I am running Memray in several configurations to find the one shows the leak most prominently.

When I run my program with memray --native mode for 1 hr, my profile file is about 2GB without using aggregated mode or 6MB in aggregated mode (yay for this feature!).

When I create a table profile, it has about 25k rows, yet many of the rows are duplicates. I have converted the html report to a CSV, file and then did further aggregation to remove duplicates by adding together allocations that happened in the same place.

This reduced the number of rows to ~200; when I removed the Thread_ID column and aggregated the entries again, I ended up with only 50 rows, and had a meaningful information about my leak.

Here are sample entries from the table that demonstrate duplication

Thread_ID,Size,Allocator,Allocations,Location
0x1,192,malloc,1,operator new(unsigned long) at <unknown>:0
0x1,56,malloc,1,operator new(unsigned long) at <unknown>:0
...
0x1,944,malloc,1,_PyMem_RawMalloc at Objects/obmalloc.c:99
0x1,944,malloc,1,_PyMem_RawMalloc at Objects/obmalloc.c:99
...
0x25 (fn_api_status_handler),328,realloc,1,upb_Arena_InitSlow at <unknown>:0
0x38 (Thread-30 (_run)),328,realloc,1,upb_Arena_InitSlow at <unknown>:0
0x1c (Thread-6 (_run)),328,realloc,1,upb_Arena_InitSlow at <unknown>:0
...
0x25 (fn_api_status_handler),71,malloc,1,<unknown> at <unknown>:0
0x25 (fn_api_status_handler),87,malloc,1,<unknown> at <unknown>:0
...

In my case the application is creating several ephemeral threads, and thread ID info is not meaningful, it would be nice to have an option to exclude ThreadID from the report. But even that aside , it seems that we should be adding together allocations that are happening at the same location, increasing the Size and Total allocations instead.

I have also tried using --trace-python-allocation, which didn't have a meaningful impact.

I also verified that duplicate entries appear in table report when not using --native mode. It happens but at a smaller ratio: post-processing resulted in 3x reduction in number of rows.

Expected Behavior

In Table report, there should be only 1 row for each (Thread_ID, Size, Allocator, Location) tuple.

Bonus: provide the option to exclude ThreadID from the report.

Steps To Reproduce

My setup is somewhat involved, I am profiling a data processing pipeline, also discussed in https://github.com/bloomberg/memray/discussions/852. I am happy to attach the collected profiles if it helps. If we must have a repro for a meaningful investigation, let me know.

Memray Version

1.19.1

Python Version

3.10

Operating System

Linux

Anything else?

No response

Nov 22 '25 02:11 tvalentyn

This behavior reproducible on memrays' fibonacci exercise (https://github.com/bloomberg/memray/blob/main/docs/tutorials/exercise_1/fibonacci.py).

The table report contains duplicates, for example:

0x1	277.3 kB	realloc	1	fibonacci at fibonacci.py:18
0x1	246.4 kB	realloc	1	fibonacci at fibonacci.py:18
0x1	277.3 kB	realloc	1	fibonacci at fibonacci.py:18
0x1	53.1 MB	malloc	28104	fibonacci at fibonacci.py:18
0x1	57.0 MB	malloc	29338	fibonacci at fibonacci.py:18
0x1	42.2 MB	malloc	24770	fibonacci at fibonacci.py:18

Nov 26 '25 00:11 tvalentyn

This behavior reproducible on memrays' fibonacci exercise (main/docs/tutorials/exercise_1/fibonacci.py).

actually, i spoke too soon, that report was created without --leaks. still i am not sure if we'd want to see duplicates there either

Nov 26 '25 00:11 tvalentyn

Repro for duplicate in table --leaks report:

cd docs/tutorials/exercise_3
memray run --native lru_cache.py 
python -m memray table --leaks --force memray-lru_cache.py.3617871.bin

Output table has duplicate entries:

0x1	2.0 kB	calloc	1	arena_map_get at Objects/obmalloc.c:1435
0x1	2.0 kB	calloc	1	arena_map_get at Objects/obmalloc.c:1435
0x1	576 B	malloc	1	_PyObject_Malloc at Objects/obmalloc.c:1966
0x1	528 B	malloc	1	_PyObject_Malloc at Objects/obmalloc.c:1966

Nov 26 '25 00:11 tvalentyn

I think the table winds up with one entry per thread per distinct call stack leading to an allocation, even though we only show the final stack frame that called the allocator in the table.

Nov 26 '25 19:11 godlygeek

Sounds like a plausible explanation, which might explain how the table could have grown to 20k+ entries due to usages of thread pools in my application.

From memray product standpoint, which direction you think we should be going to fix this: document + declare as WAI, or reduce the duplication when creating the report?

Nov 26 '25 19:11 tvalentyn

I don't have a very strong opinion here... from a product PoV, I think that the table reporter is nearly useless, although I will say that it's indeed more useful with --leaks than it is without - normally there's hundreds of pages of output that you can glean nothing useful from, at least with --leaks it's likely to be much smaller. A listing of each of the thousands or millions of allocations that your program made over the course of its run is quite uninteresting, though.

I'm fine with deduplicating by thread id + allocator + caller, that does seem like a reasonable change. I think I'd also be fine with changing the defaults to match the flamegraph reporter, where allocations from different threads are aggregated together by default, but there's a --split-threads command line argument to keep them separate. That's arguably a backwards incompatible change, but I doubt this reporter has many users, and any users it does have would likely be happy with the change... What do you think, @pablogsal ?

Nov 26 '25 20:11 godlygeek

I don't have a very strong opinion here... from a product PoV, I think that the table reporter is nearly useless.

I have found value in the past in Table reporter for one of the leaks I investigated. Sorting the lines by Size of column allowed me to quickly find the line of code responsible for largest allocation, while a flamegraph in my scenario was pretty large and many spans were hard to read since they didn't all fit on the screen.

I think I'd also be fine with changing the defaults to match the flamegraph reporter, where allocations from different threads are aggregated together by default, but there's a --split-threads command line argument to keep them separate. That's arguably a backwards incompatible change, but I doubt this reporter has many users, and any users it does have would likely be happy with the change...

+1 for this change. If you'd like to make it backwards compatible, you could add an option --join-threads for the Table reporter, but then there will be some divergence between --join-threads for Table reporter and --split-threads for Flamegraph.

Dec 05 '25 18:12 tvalentyn