ddprof
ddprof copied to clipboard
Performance of deallocation code paths - tree bitset
What does this PR do?
Implement a radix tree strategy using bitsets
Motivation
Ensure we are accurate in the way we account for addresses. Check @richardstartin 's idea
Results
Performance numbers look great. With the reader thread, some samples are costing CPU
BM_ShortLived_NoTracking/process_time/real_time 866340 ns 3245594 ns 758
BM_ShortLived_Tracking/process_time/real_time 999061 ns 4036038 ns 613
BM_LongLived_NoTracking/process_time 340981 ns 679761 ns 1074
BM_LongLived_Tracking/process_time 376116 ns 1359466 ns 548
Without the reader thread:
BM_ShortLived_NoTracking/process_time/real_time 471468 ns 987956 ns 1366
BM_ShortLived_Tracking/process_time/real_time 502947 ns 1171483 ns 1000
BM_LongLived_NoTracking/process_time 338881 ns 185030 ns 4789
BM_LongLived_Tracking/process_time 330957 ns 281179 ns 2535