ardb
ardb copied to clipboard
ARDB crashes, no error message in log...
Seems like it might be related to using the redis pipeline feature, but I can't seem to figure out where to look to figure out why it might have crashed.
Is there a specific error log where I should be able to see why it went down? It just goes down w/o any notice. It is now going down after every 300K requests (hGet) or so.
Drive space is good --- 300G 106G 195G 36% /data
Server stats look fine - ARDB at about 50% CPU and 30% RAM --- 20 0 3945348 3.451g 2360 S 47.9 29.4 1:07.12 ardb-server
any thing in syslog?
Aug 17 12:40:24 dalwalrusrocksdb01 kernel: [762268.331828] ardb-server invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 Aug 17 12:40:24 dalwalrusrocksdb01 kernel: [762268.331834] ardb-server cpuset=/ mems_allowed=0 Aug 17 12:40:24 dalwalrusrocksdb01 kernel: [762268.331837] CPU: 0 PID: 6832 Comm: ardb-server Not tainted 3.13.0-98-generic #145-Ubuntu Aug 17 12:40:24 dalwalrusrocksdb01 kernel: [762268.332126] [ 6808] 0 6808 4228321 2941740 8120 1001920 0 ardb-server Aug 17 12:40:24 dalwalrusrocksdb01 kernel: [762268.332130] Out of memory: Kill process 6808 (ardb-server) score 929 or sacrifice child Aug 17 12:40:24 dalwalrusrocksdb01 kernel: [762268.333614] Killed process 6808 (ardb-server) total-vm:16913284kB, anon-rss:11766960kB, file -rss:0kB Aug 17 13:06:53 dalwalrusrocksdb01 kernel: [763857.356090] [ 8980] 0 8980 4349153 2979894 8357 996540 0 ardb-server Aug 17 13:06:53 dalwalrusrocksdb01 kernel: [763857.356105] Out of memory: Kill process 8980 (ardb-server) score 937 or sacrifice child Aug 17 13:06:53 dalwalrusrocksdb01 kernel: [763857.358158] Killed process 8980 (ardb-server) total-vm:17396612kB, anon-rss:11919496kB, file -rss:80kB
those are listings that had "ardb" listed
is there a max memory option or something that I need to set, or do I need to give the machine more RAM?
Ive got 12GB of RAM on this machine so Im really surprised it would run out.
I removed the pipeline method and am just looking everything up 1 at a time, Im watching TOP. ARDB has risen to 31.9% memory and seems to rest there nicely for a while and then OOM kills it
If I start requests the second I begin the ardb-server, it very rapidly progresses to 100% memory and crashes. If I wait a little bit (maybe 3 minutes), it then takes maybe 15 minutes to crash, although it is intermittent, sometimes only a few minutes, sometimes it lasts 30.
It seems like it has to be some sort of memory leak since it is occurring not because of inserts but because of hGet requests.
I will happily donate $1k to this project (if that is possible) if we can fix this problem. It is pretty mission critical at this moment.
@yinqiwen I'm not sure if this helps...
[22001] 08-17 16:00:00,453 INFO coststat_hget_all:calls=300745,costs=81819213,cost_per_call=272,percents=100.0000% [22001] 08-17 16:00:00,453 INFO coststat_hget_range[0-1000]:calls=299616,costs=74902330,cost_per_call=249,percents=99.6246% [22001] 08-17 16:00:00,453 INFO coststat_hget_range[1000-5000]:calls=865,costs=1608681,cost_per_call=1859,percents=0.2876% [22001] 08-17 16:00:00,453 INFO coststat_hget_range[5000-10000]:calls=24,costs=164977,cost_per_call=6874,percents=0.0080% [22001] 08-17 16:00:00,453 INFO coststat_hget_range[10000-20000]:calls=140,costs=2466021,cost_per_call=17614,percents=0.0466% [22001] 08-17 16:00:00,453 INFO coststat_hget_range[20000-50000]:calls=98,costs=2555608,cost_per_call=26077,percents=0.0326% [22001] 08-17 16:00:00,453 INFO coststat_hget_range[50000-100000]:calls=2,costs=121596,cost_per_call=60798,percents=0.0007%
@yinqiwen updated to latest version but there was no improvement. Still had the memory leak.
The DB has A LOT of keys/hashes. Each hash has probably 48 values in it.
Do you think if I were to move this out of a hash system and just stored json blobs of the data in standard gets that it would work better?
@yinqiwen I tried switching to hGetAll to see if perhaps just getting all the raw data would be better and less overhead, but no luck.
I tried setting redis-compatible-mode to yes as I read in some other threads that this could cause memory issues. Unfortunately that did not do the trick, if anything it hastened the crash.
use info databases
command in redis-cli, which would report the memory usage of rocksdb. there is already some reports about rocksdb using too much memory, which can be solved by tweaking the configuration of rocksdb.
and is the hgetall/hget
test with same key or random keys? if there is memory leak for pipeline mode in the test, use same key to hget/hgetall
or just ping
can also make an OOM error.
i just test with ping
, hget
with same key, hgetall
with same key by redis-benchmark
in pieline mode with total number 100000000 , and there is no memory leak.
is the ardb is the latest version 0.10
? and what's the configuration of rocksdb ?
since hget/hgetall
just read&cache data from rocksdb, this issue seems related with rocksdb cache.
and i just add 2 configs to disable read cache for hget/hgetall
in rockdb into the master/0.10 branch.
u can add the below configs into ardb.conf.
rocksdb.read_fill_cache no
rocksdb.iter_fill_cache no
First, thank you so much for your help!
- I removed the pipelining and still had the problem.
- The keys are random
- I am using the latest from just doing "git clone https://github.com/yinqiwen/ardb.git"
- Here are the results of the "info databases" command...
data_dir:/data/rocksdb used_disk_space:107968311063 rocksdb_version:5.14.2 rocksdb.block_table_usage:0 rocksdb.block_table_pinned_usage:0 rocksdb_memtable_total:117442128 rocksdb_memtable_unflushed:117442128 rocksdb_table_readers_total:0 rocksdb.estimate-table-readers-mem:0 rocksdb.cur-size-all-mem-tables:58721064
** Compaction Stats [0] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W- Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
L0 2/0 127.32 MB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 L1 3/0 208.97 MB 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 L2 44/0 2.46 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 L3 388/0 24.98 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 L4 1082/0 72.79 GB 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 Sum 1519/0 100.55 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 Uptime(secs): 65.9 total, 65.9 interval Flush(GB): cumulative 0.000, interval 0.000 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 se conds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pen ding_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
** File Read Latency Histogram By Level [0] **
** DB Stats ** Uptime(secs): 65.9 total, 65.9 interval Cumulative writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest : 0.00 GB, 0.00 MB/s Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent
-
I added "rocksdb.read_fill_cache no" and "rocksdb.iter_fill_cache no" to ardb.conf, and upgraded to .10 - It still crashed when I attempted to run my code. It actually crashes faster when I use hGetAll rather than just hGet and grab the specific field I need, which makes me suspicious that it is a cache getting filled somewhere. Just what cache and how to prevent it, I don't know.
-
RocksDB has the out-of-the-box configuration. I'm not even sure how/where to make changes to that configuration but I would be happy to do so if you know how it might help this issue.
-
I ran a test where I just requested the same key over and over again using hGetAll and it had no problems, so it is definitely filling up a cache or something of that sort as I request more keys. Is there some way to flush the cache periodically?
- can u past all
info all
&client list
results when the memory usage is high? - to change rockddb's config , just edit
rocksdb.options
in ardb.conf - according this wiki https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB , we can increase block size & optimize bloomfilter settings to reduce memory usage.
- this configuration is change block size to 32k from 4k, use 8bit per key for bloomfilter, and enable the flag
optimize_filters_for_hits
rocksdb.options write_buffer_size=512M;max_write_buffer_number=5;min_write_buffer_number_to_merge=3;compression=kSnappyCompression;\
bloom_locality=1;memtable_prefix_bloom_size_ratio=0.1;\
block_based_table_factory={block_cache=512M;filter_policy=bloomfilter:8:true};\
create_if_missing=true;max_open_files=10000;rate_limiter_bytes_per_sec=50M;\
use_direct_io_for_flush_and_compaction=true;use_adaptive_mutex=true;optimize_filters_for_hits=true
- disable replication log in ardb.conf if there is no slave of current instance, which could save 1G memory at most.
repl-backlog-size 0
- add
reply-pool-size 100
into ardb.conf would reduce reply pool size from 10000 per thread to 100 per thread. - just add redis reply count report in
info
command in the latest code.
Woohoo! Your changes seemed to have paused the effort at somewhere around 58% memory! Here is the info all output...
=======================
Server
ardb_version:0.10.0 redis_version:2.8.0 engine:rocksdb ardb_home:/install os:Linux 3.13.0-98-generic x86_64 gcc_version:4.8.4 process_id:18207 run_id:2bc2903a5bf449dc3eab36ab947a185db400a24b tcp_port:6379 listen:0.0.0.0:6379 uptime_in_seconds:24 uptime_in_days:0 executable:/install/ardb/src/ardb-server config_file:/install/ardb/ardb.conf
Databases
data_dir:/data/rocksdb used_disk_space:107968311059 living_iterator_num:0 rocksdb_version:5.14.2 rocksdb.block_table_usage:0 rocksdb.block_table_pinned_usage:0 rocksdb_memtable_total:117442128 rocksdb_memtable_unflushed:117442128 rocksdb_table_readers_total:0 rocksdb.estimate-table-readers-mem:0 rocksdb.cur-size-all-mem-tables:58721064
** Compaction Stats [0] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W- Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
L0 2/0 127.32 MB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 L1 3/0 208.97 MB 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 L2 44/0 2.46 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 L3 388/0 24.98 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 L4 1082/0 72.79 GB 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 Sum 1519/0 100.55 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 Uptime(secs): 23.4 total, 23.4 interval Flush(GB): cumulative 0.000, interval 0.000 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 se conds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pen ding_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
** File Read Latency Histogram By Level [0] **
** DB Stats ** Uptime(secs): 23.4 total, 23.4 interval Cumulative writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest : 0.00 GB, 0.00 MB/s Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent
Clients
connected_clients:1 blocked_clients:0
Persistence
loading:0 rdb_last_save_time:0 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:0 rdb_bgsave_in_progress:0 rdb_current_bgsave_time_sec:0
CPU
used_cpu_sys:0.08 used_cpu_user:0.04
Replication
role: singleton repl_backlog_size: 0
Memory
used_memory_rss:261099520
Stats
total_commands_processed:0 instantaneous_ops_per_sec:0 total_connections_received:1 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 pubsub_channels:0 pubsub_patterns:0 expire_scan_keys:0
Keyspace
db0:keys=4964608156
Commandstats
Coststats
127.0.0.1:6379> info all
Server
ardb_version:0.10.0 redis_version:2.8.0 engine:rocksdb ardb_home:/install os:Linux 3.13.0-98-generic x86_64 gcc_version:4.8.4 process_id:18207 run_id:2bc2903a5bf449dc3eab36ab947a185db400a24b tcp_port:6379 listen:0.0.0.0:6379 uptime_in_seconds:159 uptime_in_days:0 executable:/install/ardb/src/ardb-server config_file:/install/ardb/ardb.conf
Databases
data_dir:/data/rocksdb used_disk_space:107968311059 living_iterator_num:1 rocksdb_version:5.14.2 rocksdb.block_table_usage:4096 rocksdb.block_table_pinned_usage:4096 rocksdb_memtable_total:117442128 rocksdb_memtable_unflushed:117442128 rocksdb_table_readers_total:7064739869 rocksdb.estimate-table-readers-mem:0 rocksdb.cur-size-all-mem-tables:58721064
** Compaction Stats [0] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W- Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
L0 2/0 127.32 MB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 L1 3/0 208.97 MB 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 L2 44/0 2.46 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 L3 388/0 24.98 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 L4 1082/0 72.79 GB 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 Sum 1519/0 100.55 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0 0 0 0.000 0 0 Uptime(secs): 158.4 total, 135.0 interval Flush(GB): cumulative 0.000, interval 0.000 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 se conds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pen ding_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
** File Read Latency Histogram By Level [0] **
** DB Stats ** Uptime(secs): 158.4 total, 135.0 interval Cumulative writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest : 0.00 GB, 0.00 MB/s Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent
Clients
connected_clients:2 blocked_clients:0
Persistence
loading:0 rdb_last_save_time:0 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:0 rdb_bgsave_in_progress:0 rdb_current_bgsave_time_sec:0
CPU
used_cpu_sys:7.59 used_cpu_user:32.23
Replication
role: singleton repl_backlog_size: 0
Memory
used_memory_rss:7389245440
Stats
total_commands_processed:44540 instantaneous_ops_per_sec:1169 total_connections_received:2 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 pubsub_channels:0 pubsub_patterns:0 expire_scan_keys:0
Keyspace
db0:keys=4964608156
Commandstats
cmdstat_hgetall:calls=44539,usec=54426420,usecpercall=1221 cmdstat_info:calls=1,usec=2553,usecpercall=2553
Coststats
coststat_info_all:calls=1,costs=2553,cost_per_call=2553,percents=100.0000% coststat_info_range[1000-5000]:calls=1,costs=2553,cost_per_call=2553,percents=100.0000% coststat_hgetall_all:calls=44539,costs=54426420,cost_per_call=1221,percents=100.0000% coststat_hgetall_range[0-1000]:calls=42494,costs=24641496,cost_per_call=579,percents=95.4 085% coststat_hgetall_range[1000-5000]:calls=933,costs=1403273,cost_per_call=1504,percents=2.0 948% coststat_hgetall_range[5000-10000]:calls=61,costs=426992,cost_per_call=6999,percents=0.13 70% coststat_hgetall_range[10000-20000]:calls=553,costs=10021981,cost_per_call=18122,percents =1.2416% coststat_hgetall_range[20000-50000]:calls=429,costs=13214553,cost_per_call=30803,percents =0.9632% coststat_hgetall_range[50000-100000]:calls=65,costs=4209337,cost_per_call=64759,percents= 0.1459% coststat_hgetall_range[100000-200000]:calls=4,costs=508788,cost_per_call=127197,percents= 0.0090%
in the info
output, u can see most memory used by rocksdb's index/filter, this wiki shows a way to alleviates the overhead. https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters
rocksdb_table_readers_total:7064739869
....
used_memory_rss:7389245440