risinglight icon indicating copy to clipboard operation
risinglight copied to clipboard

storage: support direct I/O

Open skyzh opened this issue 2 years ago • 8 comments

Now we use buffered I/O, which might make benchmarking inaccurate.

  • [ ] Align blocks in column builder
  • [ ] Support direct I/O in column reader

skyzh avatar Jan 28 '22 12:01 skyzh

And it will also take up precious memory of the server...

skyzh avatar Jan 28 '22 12:01 skyzh

Let me have a try!

Xuanwo avatar Jan 29 '22 04:01 Xuanwo

Not all fs support O_DIRECT, such as tmpfs which we have tested over it.

test storage::secondary::column::primitive_column_factory::tests::test_scan_rle_i32 ... FAILED
test storage::secondary::rowset::rowset_iterator::tests::test_rowset_iterator_with_filter ... FAILED
test storage::secondary::column::primitive_column_factory::tests::test_scan_i32 ... FAILED
test storage::secondary::rowset::rowset_iterator::tests::test_rowset_iterator ... FAILED
test storage::secondary::column::primitive_column_factory::tests::test_skip ... FAILED
test storage::secondary::column::primitive_column_factory::tests::test_skip_multiple_times ... FAILED
test storage::secondary::rowset::disk_rowset::tests::test_get_block ... FAILED
test storage::secondary::rowset::rowset_writer::tests::test_rowset_flush ... ok

failures:

---- storage::secondary::column::primitive_column_factory::tests::test_scan_rle_i32 stdout ----
thread 'storage::secondary::column::primitive_column_factory::tests::test_scan_rle_i32' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })
disabled backtrace', src/storage/secondary/rowset/disk_rowset.rs:243:14

---- storage::secondary::rowset::rowset_iterator::tests::test_rowset_iterator_with_filter stdout ----
thread 'storage::secondary::rowset::rowset_iterator::tests::test_rowset_iterator_with_filter' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })
disabled backtrace', src/storage/secondary/rowset/disk_rowset.rs:198:14

Here are possible solutions:

  • Check the fs capability while starting up.
  • Add a new flag or feature.
  • Remove O_DIRECT if we meet error while open.

Xuanwo avatar Feb 26 '22 02:02 Xuanwo

I would prefer making O_DIRECT part of the I/O engine.

https://github.com/risinglightdb/risinglight/blob/2e1b9fc53b5e4215b9fc994b77f1c8780a4196df/src/storage/secondary/options.rs#L8-L15

We can add a new DirectPositionedRead engine, and use that engine by default. For tmpfs and unit tests, we still use the original PositionedRead engine without direct I/O.

Thanks for your contribution!

skyzh avatar Feb 26 '22 02:02 skyzh

So we don't need to care about index here? It seems that IoBackend only affects column files:

let column = Column::new(
    ColumnIndex::from_bytes(&index_content)?,
    match io_backend {
        IOBackend::NormalRead => {
            ColumnReadableFile::NormalRead(Arc::new(Mutex::new(file.into_std().await)))
        }
        IOBackend::PositionedRead => {
            ColumnReadableFile::PositionedRead(Arc::new(file.into_std().await))
        }
    },
    block_cache.clone(),
    BlockCacheKey::default().rowset(rowset_id).column(id as u32),
);

Xuanwo avatar Feb 26 '22 04:02 Xuanwo

Yes. Index is small, and I'd like to ignore it for I/O backends. If in the future it would become large (like the index in RocksDB), we can migrate it to use the same I/O backend as column blocks.

skyzh avatar Feb 26 '22 04:02 skyzh

Shall we support direct I/O on MacOS? We can disable the cache with fcntl(fd, F_NOCACHE, 1).

arkbriar avatar Mar 01 '22 03:03 arkbriar

I have lost interest in this issue. Feel free to take this~

Previous work: https://github.com/risinglightdb/risinglight/pull/532

Xuanwo avatar Jul 28 '22 02:07 Xuanwo