valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Add zfree_with_size

Open poiuj opened this issue 1 year ago • 4 comments

Description

zfree updates memory statistics. It gets the size of the buffer from jemalloc by calling zmalloc_size. This operation is costly. We can avoid it if we know the buffer size. For example, we can calculate size of sds from the data we have in its header.

This commit introduces zfree_with_size function that accepts both pointer to a buffer, and its size. zfree is refactored to call zfree_with_size.

sdsfree uses the new interface for all but SDS_TYPE_5.

Benchmark

Dataset is 3 million strings. Each benchmark run uses its own value size (8192, 512, and 120). The benchmark is 100% write load for 5 minutes.

value size       new tps      old tps      %       new us/call    old us/call    %
8k               272088.53    269971.75    0.78    1.83           1.92           -4.69
512              356881.91    352856.72    1.14    1.27           1.35           -5.93
120              377523.81    368774.78    2.37    1.14           1.19           -4.20

poiuj avatar May 07 '24 12:05 poiuj

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.05%. Comparing base (4176604) to head (1a3a6d6). Report is 37 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #453      +/-   ##
============================================
- Coverage     70.07%   70.05%   -0.03%     
============================================
  Files           110      110              
  Lines         59989    60085      +96     
============================================
+ Hits          42037    42092      +55     
- Misses        17952    17993      +41     
Files Coverage Δ
src/sds.c 85.86% <100.00%> (-0.13%) :arrow_down:
src/zmalloc.c 85.05% <100.00%> (+0.41%) :arrow_up:

... and 37 files with indirect coverage changes

codecov[bot] avatar May 08 '24 00:05 codecov[bot]

I think the zmalloc_size is really expensive to update the used_memory, we had a similar discussion in https://github.com/valkey-io/valkey/pull/308#discussion_r1592038438, for this patch, I'm curious if the assertion of sds->alloc == allocate size in all cases, what happened if call the sdsResize ?

zhulipeng avatar May 08 '24 07:05 zhulipeng

@lipzhu the update of used_memory has 2 cost aspects, the zmalloc_size and also the fact it's an atomic operation.

  1. regarding zmalloc_size cost: it is actually called twice, once in zmalloc code and enother one inside the je_free because the tcache needs to know what bin the buffer belongs to. So, if you remove the zmalloc_size call for stat, you are still paying the cost. When using je_sdallocx to free (this PR) it uses, internally in jemalloc, fast_path that avoids fetching the size to begin with. BTW: I think for cpp delete operator is actually overloaded in jemalloc.cpp to use sized delete.
  2. [edit I see you already have PR for this] regarding atomic cost: this I believe can be also eliminated with use of thread local vars(I guess the team has considered this before)

zvi-code avatar May 08 '24 11:05 zvi-code

@lipzhu good point about sds->alloc == allocate size. I was sure it is. But now I see that sds->alloc is truncated when the sds len is close enough to sds type max len (2^8, 2^16, etc.). I'll update the PR with a fix.

poiuj avatar May 08 '24 16:05 poiuj

Ok, I wasn't able to get as consistent of performance numbers. Diving into a perf analysis shows about a 20% reduction in CPU in sdsfree though. So it seems like there might be other bottlenecks, this is definitely a good optimization.

madolson avatar Jun 18 '24 18:06 madolson

Yes, let's go with this approach.

poiuj avatar Jun 18 '24 20:06 poiuj