Add zfree_with_size
Description
zfree updates memory statistics. It gets the size of the buffer from jemalloc by calling zmalloc_size. This operation is costly. We can avoid it if we know the buffer size. For example, we can calculate size of sds from the data we have in its header.
This commit introduces zfree_with_size function that accepts both pointer to a buffer, and its size. zfree is refactored to call zfree_with_size.
sdsfree uses the new interface for all but SDS_TYPE_5.
Benchmark
Dataset is 3 million strings. Each benchmark run uses its own value size (8192, 512, and 120). The benchmark is 100% write load for 5 minutes.
value size new tps old tps % new us/call old us/call %
8k 272088.53 269971.75 0.78 1.83 1.92 -4.69
512 356881.91 352856.72 1.14 1.27 1.35 -5.93
120 377523.81 368774.78 2.37 1.14 1.19 -4.20
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 70.05%. Comparing base (
4176604) to head (1a3a6d6). Report is 37 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #453 +/- ##
============================================
- Coverage 70.07% 70.05% -0.03%
============================================
Files 110 110
Lines 59989 60085 +96
============================================
+ Hits 42037 42092 +55
- Misses 17952 17993 +41
| Files | Coverage Δ | |
|---|---|---|
| src/sds.c | 85.86% <100.00%> (-0.13%) |
:arrow_down: |
| src/zmalloc.c | 85.05% <100.00%> (+0.41%) |
:arrow_up: |
I think the zmalloc_size is really expensive to update the used_memory, we had a similar discussion in https://github.com/valkey-io/valkey/pull/308#discussion_r1592038438, for this patch, I'm curious if the assertion of sds->alloc == allocate size in all cases, what happened if call the sdsResize ?
@lipzhu the update of used_memory has 2 cost aspects, the zmalloc_size and also the fact it's an atomic operation.
- regarding
zmalloc_sizecost: it is actually called twice, once in zmalloc code and enother one inside theje_freebecause the tcache needs to know what bin the buffer belongs to. So, if you remove the zmalloc_size call for stat, you are still paying the cost. When usingje_sdallocxto free (this PR) it uses, internally in jemalloc, fast_path that avoids fetching the size to begin with. BTW: I think for cpp delete operator is actually overloaded in jemalloc.cpp to use sized delete. - [edit I see you already have PR for this] regarding
atomiccost: this I believe can be also eliminated with use of thread local vars(I guess the team has considered this before)
@lipzhu good point about sds->alloc == allocate size. I was sure it is. But now I see that sds->alloc is truncated when the sds len is close enough to sds type max len (2^8, 2^16, etc.). I'll update the PR with a fix.
Ok, I wasn't able to get as consistent of performance numbers. Diving into a perf analysis shows about a 20% reduction in CPU in sdsfree though. So it seems like there might be other bottlenecks, this is definitely a good optimization.
Yes, let's go with this approach.