pika icon indicating copy to clipboard operation
pika copied to clipboard

memory leak of 4.0

Open chenbt-hz opened this issue 1 year ago • 14 comments

Is this a regression?

Yes

Description

单实例,持续执行hset命令压测,2-4w的qps,一小时后内存占用100% image

通过valgrind --leak-check=full --tool=memcheck --log-file=valgrind_output.txt ./pika程序 -c pika_9221.conf查看 image image

Please provide a link to a minimal reproduction of the bug

No response

Screenshots or videos

No response

Please provide the version you discovered this bug in (check about page for version information)

Version: unstable分支 编译
OS: centos7

Anything else?

问题情况已同步少一 @w

chenbt-hz avatar Mar 20 '24 01:03 chenbt-hz

用当前 unstable 分支代码编译后再测试一遍

AlexStocks avatar Mar 20 '24 04:03 AlexStocks

Bot detected the issue body's language is not English, translate it automatically.


Compile with the current unstable branch code and test again

Issues-translate-bot avatar Mar 20 '24 04:03 Issues-translate-bot

用当前 unstable 分支代码编译后再测试一遍

代码版本 image

unstable分支的测试结果 image

image

chenbt-hz avatar Mar 20 '24 08:03 chenbt-hz

我感觉,你的这个内存使用包含了pagecache,大部分内存都是pagecache占用的,下次在出现这种情况时,清空下pagecache看内存有没有掉下来: sudo echo 3 >> /proc/sys/vm/drop_caches @chenbt-hz cc @AlexStocks

wangshao1 avatar Mar 31 '24 11:03 wangshao1

Bot detected the issue body's language is not English, translate it automatically.


I feel that your memory usage includes pagecache, and most of the memory is occupied by pagecache. Next time this happens, clear pagecache and see if the memory has dropped: sudo echo 3 >> /proc/sys/vm/drop_caches @chenbt-hz cc @AlexStocks

Issues-translate-bot avatar Mar 31 '24 11:03 Issues-translate-bot

还有,看起来你们开了SWAP?,线上建议把swap关了。

wangshao1 avatar Mar 31 '24 14:03 wangshao1

Bot detected the issue body's language is not English, translate it automatically.


Also, it looks like you have SWAP enabled? , it is recommended online to turn off swap.

Issues-translate-bot avatar Mar 31 '24 14:03 Issues-translate-bot

情况总结:

  1. block_size较大时(例如64G),会占用比较大的内存
  2. info 使用内存统计可能忽略了block_size
  3. 关闭block_size后测试,持续导入数据内存占用会缓慢增加,但是影响不大(约导入1T / 增加1G)但需要明确占用的原因

下一步是:

  1. 完善 info 使用内存统计
  2. 确定内存占用增长原因
  3. 较长时间的验证

chenbt-hz avatar Apr 09 '24 09:04 chenbt-hz

Bot detected the issue body's language is not English, translate it automatically.


Summary of the situation:

  1. When block_size is large (for example, 64G), it will occupy a relatively large amount of memory.
  2. info usage memory statistics may ignore block_size
  3. Test after closing block_size. The memory usage of continued data import will increase slowly, but the impact is not significant (approximately 1T imported / 1G increased), but the reason for the occupancy needs to be clarified.

The next step is:

  1. Improve info usage memory statistics
  2. Determine the reason for the increase in memory usage
  3. Longer verification

Issues-translate-bot avatar Apr 09 '24 09:04 Issues-translate-bot

少一:看下一个 RocksDB 上有多少 SST 文件,max files 要小于 这个 sst 文件个数

AlexStocks avatar Apr 26 '24 12:04 AlexStocks

Bot detected the issue body's language is not English, translate it automatically.


Less than one: Check how many SST files there are on the next RocksDB. max files should be less than this number of sst files.

Issues-translate-bot avatar Apr 26 '24 12:04 Issues-translate-bot

基本可以确定,目前 pika 不存在内存泄漏。

AlexStocks avatar Apr 26 '24 12:04 AlexStocks

Bot detected the issue body's language is not English, translate it automatically.


It is basically certain that there is currently no memory leak in pika.

Issues-translate-bot avatar Apr 26 '24 12:04 Issues-translate-bot

在 配置文件的中 把 cache-index-and-filter-blocks: 设置为 yes 开启这个 配置后, rocksdb会把 index和filter数据放到 block-cache 中, 如果block-cache 不够了, 会使用 LRU 淘汰数据, 所以 这样 内存占用大小基本可控

https://github.com/OpenAtomFoundation/pika/issues/1048

https://github.com/OpenAtomFoundation/pika/issues/1561#issuecomment-1575896838

可以参考 这两个issue

lqxhub avatar Apr 26 '24 12:04 lqxhub

在 配置文件的中 把 cache-index-and-filter-blocks: 设置为 yes 开启这个 配置后, rocksdb会把 index和filter数据放到 block-cache 中, 如果block-cache 不够了, 会使用 LRU 淘汰数据, 所以 这样 内存占用大小基本可控

#1048

#1561 (comment)

可以参考 这两个issue

使用unstable分支编译后,hset写入3k长度的value,如下配置测试 db-instance-num: 6 block-cache: 128M, max-cache-file=5000, enable-partitioned-index-filters: yes, cache-index-and-filter-blocks=true, pin_l0_filter_and_index_blocks_in_cache = yes, share-block-cache: true

写入1.6T数据,内存基本稳定。tablereader曲线持续增长,暂未看到稳定不增长的情况,但是影响相对可控。 image

image

chenbt-hz avatar May 07 '24 02:05 chenbt-hz