mysql-5.6 Running a large sysbench prepare against facebook/mysql-5.6 causes it to OOM

Running a large sysbench prepare against facebook/mysql-5.6 causes it to OOM. The same large sysbench prepare against Percona Server's MyRocks with exactly the same configuration runs fine without any OOM.

Also once facebook/mysql-5.6 restarts after being killed by OOM killer, it is unable to initialize the RocksDB storage engine due to corruption. Some excerpts from the log are shown below:

2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,434)
2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,428)
2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,435)
2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,429)
2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,432)
2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,430)
2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,431)
2018-04-18 07:15:46 117997 [Note] RocksDB: Begin index creation (0,433)
2018-04-18 07:39:29 129536 [Warning] You need to use --log-bin to make --binlog-format work.
2018-04-18 07:39:29 129536 [Note] Plugin 'FEDERATED' is disabled.
2018-04-18 07:39:29 129536 [Warning] The option innodb (skip-innodb) is deprecated and will be removed in a future release
2018-04-18 07:39:29 129536 [Note] Plugin 'InnoDB' is disabled.
2018-04-18 07:39:30 129536 [Note] RocksDB: 2 column families found
2018-04-18 07:39:30 129536 [Note] RocksDB: Column Families at start:
2018-04-18 07:39:30 129536 [Note]   cf=default
2018-04-18 07:39:30 129536 [Note]     write_buffer_size=67108864
2018-04-18 07:39:30 129536 [Note]     target_file_size_base=33554432
2018-04-18 07:39:30 129536 [Note]   cf=__system__
2018-04-18 07:39:30 129536 [Note]     write_buffer_size=67108864
2018-04-18 07:39:30 129536 [Note]     target_file_size_base=33554432
2018-04-18 07:39:32 129536 [ERROR] RocksDB: Error opening instance, Status Code: 2, Status: Corruption: truncated header
2018-04-18 07:39:32 129536 [ERROR] Plugin 'ROCKSDB' init function returned error.
2018-04-18 07:39:32 129536 [ERROR] Plugin 'ROCKSDB' registration as a STORAGE ENGINE failed.
2018-04-18 07:39:32 129536 [ERROR] Unknown/unsupported storage engine: rocksdb
2018-04-18 07:39:32 129536 [ERROR] Aborting

Version info:

MyRocks PS is Percona Server 5.7.21-20-1
MyRocks FB is FB MySQL 5.6.35

RocksDB configuration options are similarly defined on both MyRocks PS and MyRocks FB and pasted below:

rocksdb_max_open_files                          = 32000
rocksdb_max_background_jobs                     = 8
rocksdb_max_total_wal_size                      = 4G
rocksdb-flush-log-at-trx-commit                 = 2
rocksdb_block_size                              = 16384
rocksdb_block_cache_size                        = 3G
rocksdb_table_cache_numshardbits                = 6
rocksdb_bytes_per_sync                          = 4194304
rocksdb_wal_bytes_per_sync                      = 4194304
rocksdb_use_direct_io_for_flush_and_compaction  = 1
rocksdb_rate_limiter_bytes_per_sec              = 0 #Unlimited
rocksdb_compaction_sequential_deletes_count_sd  = 1
rocksdb_compaction_sequential_deletes           = 199999
rocksdb_compaction_sequential_deletes_window    = 200000
rocksdb_default_cf_options                      = write_buffer_size=64m;target_file_size_base=32m;max_bytes_for_level_base=512m;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;max_write_buffer_number=4;compression_per_level=kLZ4Compression;bottommost_compression=kZlibCompression;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio

The dataset size is ~50GB while the system has 18GB of memory.

Apr 19 '18 07:04 ovaistariq

I also tested with set global rocksdb_commit_in_the_middle=1; but to no avail.

Apr 19 '18 07:04 ovaistariq

at what git hash did you compile? do you use jemalloc, tcmalloc or glibc malloc?

Apr 19 '18 15:04 mdcallag

You can restart the mysql instance by setting rocksdb_wal_recovery_mode=2. The error message was a bit confusing but it meant WAL entry got corrupted (possibly because of OOM). rocksdb_wal_recovery_mode=2 truncates the corrupted WAL (and all following entries) then continues recovery. It will end up losing the wal entries but in your case it should be fine.

Apr 19 '18 15:04 yoshinorim

Here is the git hash information:

2018-04-19 08:06:41 36299 [Note] MySQL git hash: 9a01e50164e448831fd8b4a899eafe61695b7d3e - %cI
2018-04-19 08:06:41 36299 [Note] RocksDB git hash: 31ee4bf2400cf1807718192d352ca8bd7837ada2 - %cI

And I am using glibc malloc

Apr 19 '18 16:04 ovaistariq

Thanks @yoshinorim I was able to restart mysql by setting rocksdb_wal_recovery_mode=2. Since rocksdb_flush_log_at_trx_commit=1 so all the WAL entries for committed transactions should have been synced to disk. So I assume WAL entries getting corrupted because of OOM would be a separate issue.

Apr 19 '18 16:04 ovaistariq

I cannot reproduce this on a server with 16gb of RAM and a 12gb block cache when using jemalloc. I tried with indexes created before and after the load. But I just noticed that you used glibc malloc and so I will repeat with that. My experience with glibc malloc + RocksDB has not been good. RocksDB puts a lot of stress on allocator defragmentation skills, and glibc is much worse than jemalloc at that. The end result was 2X larger RSS which can explain the OOM.

My git hashes were: 2018-04-21 10:59:27 7009 [Note] MySQL git hash: 6f8a6f19b46023c0f08d87a8abd5225b4c3ed154 - 2018-04-06T13:31:10-07:00 2018-04-21 10:59:27 7009 [Note] RocksDB git hash: 89d989ed75ed89e756156d1f82e123b24591be8c - 2018-03-29T14:46:41-07:00

And this is the latest commit for that git hash: commit 6f8a6f19b46023c0f08d87a8abd5225b4c3ed154 Author: Jay Edgar [email protected] Date: Wed Mar 21 14:50:55 2018 -0700 Return state information for detached sessions

You are on a more recent build: commit 9a01e50164e448831fd8b4a899eafe61695b7d3e Author: Herman Lee [email protected] Date: Mon Apr 16 21:00:42 2018 -0700 Fix DBUG_ASSERT in compressed event cache teardown

Apr 22 '18 15:04 mdcallag

Good info Mark. This could also explain the differences between the FB-MySQL + MyRocks test and the Percona Server test. If using the standard PS packages and installer, jemalloc is required and enabled (and THP disabled) for both MyRocks and TokuDB via the ps-admin script.

Apr 22 '18 16:04 george-lorch

Using 4gb block cache with indexes created after the load. RSZ was 4.8gb with jemalloc vs 12.4gb with glibc-2.23 malloc. Test server is Ubuntu 16.04. My guess is that OOM really means OOM and jemalloc is the answer

http://smalldatum.blogspot.com/2018/04/myrocks-malloc-and-fragmentation-strong.html

Apr 22 '18 22:04 mdcallag

I am now using MyRocks with jemalloc. Running with a 12GB block cache on a 16GB of RAM caused mysqld to be killed by OOM on both PS and FB-MySQL but I think that is expected because sysbench was issuing huge transactions. I will test it again with rocksdb_commit_in_the_middle=1

Apr 25 '18 22:04 ovaistariq

Can you share a sysbench command line and my.cnf from the OOM? I use a 12gb block cache on 16g RAM servers and sysbench without OOM. RSS for mysqld reaches ~14gb during the test.

On Wed, Apr 25, 2018 at 3:44 PM, Ovais Tariq [email protected] wrote:

I am now using MyRocks with jemalloc. Running with a 12GB block cache on a 16GB of RAM caused mysqld to be killed by OOM on both PS and FB-MySQL but I think that is expected because sysbench was issuing huge transactions. I will test it again with rocksdb_commit_in_the_middle=1

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/facebook/mysql-5.6/issues/814#issuecomment-384457040, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkKTfEXqYjNyTnq62wBxMKbQSkcLmsYks5tsPw-gaJpZM4TbT0R .

-- Mark Callaghan [email protected]

Apr 26 '18 21:04 mdcallag

Here is the sysbench command:

sysbench ./trips.lua --db-driver=mysql --threads=8 --tables=16 --table-size=5000000 --mysql_storage_engine=RocksDB --mysql_table_options="DEFAULT CHARSET=utf8 COLLATE utf8_bin" --mysql-host=host01 --mysql-user=sysbench --mysql-password=sysbench --mysql-db=test --rand-type=uniform prepare

Apr 27 '18 07:04 ovaistariq

So this isn't a new OOM. If this occurs with gblic malloc, then switching to jemalloc is the fix. If it occurs with jemalloc then can you repeat the test with the RocksDB block cache set to something smaller, maybe 4gb, and report the size of RSS for mysqld when prepare finishes?

On Fri, Apr 27, 2018 at 12:11 AM, Ovais Tariq [email protected] wrote:

Here is the sysbench command:

sysbench ./trips.lua --db-driver=mysql --threads=8 --tables=16 --table-size=5000000 --mysql_storage_engine=RocksDB --mysql_table_options="DEFAULT CHARSET=utf8 COLLATE utf8_bin" --mysql-host=host01 --mysql-user=sysbench --mysql-password=sysbench --mysql-db=test --rand-type=uniform prepare

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/facebook/mysql-5.6/issues/814#issuecomment-384885510, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkKTUfzk3dSeoMvD7L_qrw163_e0Pa8ks5tssSDgaJpZM4TbT0R .

-- Mark Callaghan [email protected]

Apr 27 '18 14:04 mdcallag

In my case I think 12GB block cache left too little a room on the host. I checked the RSS size reported for mysqld in dmesg after getting OOM-killed and it was ~14GB. I was able to successfully run the prepare with 9GB block cache and the RSS was ~11GB.

To summarize switching to jemalloc fixes the issue.

I think it would be good for https://github.com/facebook/mysql-5.6/wiki/Build-Steps to mention that jemalloc must be used or even have a separate wiki page on malloc.

Apr 27 '18 17:04 ovaistariq

https://jira.mariadb.org/browse/MDEV-20406

Sep 14 '21 01:09 mariadb-RoelVandePaar

@yoshinorim - is this possible when mysqld dies while RocksDB is starting to write a new WAL file? In that case it can be partially written.

Sep 16 '21 02:09 mdcallag

I commented on the jira bug report. I believe they run with rocksdb_wal_recovery_mode=1, and 2 should open database consistently.

Sep 16 '21 02:09 yoshinorim

mysql-5.6 mysql-5.6 copied to clipboard

Running a large sysbench prepare against facebook/mysql-5.6 causes it to OOM

mysql-5.6
mysql-5.6 copied to clipboard