rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Long lived Iterator pinning flushed MemTable and compacted SSTs

Open rockeet opened this issue 1 year ago • 3 comments

Expected behavior

Flushed MemTables and compacted SSTs should be deleted.

Actual behavior

Long lived Iterators referring Version objects which referring MemTables and compacted SSTs which lead to memory over consumption while writing to DB.

Steps to reproduce the behavior

Create an iterator and using it while writing to DB, newly created MemTables can not be freed.


We noticed this issue when we running sysbench on myrocks, we also have filed an issue and PR for MyRocks, we worked around this issue by creating a new Iterator and delete old Iterator periodically. It is better to resolve this issue in RocksDB.

rockeet avatar Aug 17 '22 06:08 rockeet

I agree it would be great to solve this in RocksDB. While it isn't for me to decide, I am not sure sure that will happen given the complexity.

For all of the MVCC engines that I understand a long-lived transaction isn't free. It blocks vacuum in Postgres, leading to much space-amp. It blocks purge in InnoDB, leading to much space-amp. And here you describe the cost of it for RocksDB.

The workaround (or solution) for Postgres and InnoDB is to kill long open transactions when they become a problem. Since this issue was inspired by a MyRocks problem, then the solution works there as well. But that is only a solution for long open transactions created by users, and in your case the problem is a long-open iterator used during index creation.

So my solution implies you can only avoid the problem in MyRocks if you don't create secondary indexes on existing large tables, or use OSC to do that. Unfortunately that isn't a great workaround.

I appreciate that you filed this and provided a PR for MyRocks to avoid this during create index. But again, I am not sure that RocksDB will fix this.

mdcallag avatar Aug 17 '22 13:08 mdcallag

Is this related to #10487? It seems doing an iterator refresh with snapshot could free the pinned resources.

cbi42 avatar Aug 17 '22 16:08 cbi42

Is this related to #10487? It seems doing an iterator refresh with snapshot could free the pinned resources.

Yes, this issue has some relation to #10487, but #10487 needs to change user API.

This issue should be solved without user API change, BTW: if #10487 is resolved, this issue can be solved based on it.

rockeet avatar Aug 18 '22 03:08 rockeet