foyer icon indicating copy to clipboard operation
foyer copied to clipboard

Crashed when used large storage mode with 100GB capacity and 10GB single cache file size

Open hopkings2008 opened this issue 1 year ago • 8 comments

Hi all, We met one crash when we use the large storage mode with 100GB capacity and each cache file with 10GB size. The crashed stack is as below:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff67a98e4 in __GI_abort () at abort.c:79
#2  0x0000555557658eca in std::sys::pal::unix::abort_internal () at std/src/sys/pal/unix/mod.rs:372
#3  0x000055555576a9ca in std::process::abort () at std/src/process.rs:2394
#4  0x000055555765a791 in std::alloc::rust_oom () at std/src/alloc.rs:376
#5  0x000055555765a7b3 in std::alloc::_::__rg_oom () at std/src/alloc.rs:371
#6  0x000055555576c083 in alloc::alloc::handle_alloc_error::rt_error () at alloc/src/alloc.rs:383
#7  alloc::alloc::handle_alloc_error () at alloc/src/alloc.rs:389
#8  0x000055555576c064 in alloc::raw_vec::handle_error () at alloc/src/raw_vec.rs:788
#9  0x0000555556f10be7 in alloc::raw_vec::RawVecInner<A>::reserve::do_reserve_and_handle (slf=0x7ffedb1efc18, len=0, 
    additional=8726675783204887973, elem_layout=...) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/raw_vec.rs:555
#10 0x0000555556f0e868 in alloc::raw_vec::RawVecInner<A>::reserve (self=0x7ffedb1efc18, len=0, additional=8726675783204887973, elem_layout=...)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/raw_vec.rs:560
#11 alloc::raw_vec::RawVec<T,A>::reserve (self=0x7ffedb1efc18, len=0, additional=8726675783204887973)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/raw_vec.rs:341
#12 alloc::vec::Vec<T,A>::reserve (self=0x7ffedb1efc18, additional=8726675783204887973)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/vec/mod.rs:973
#13 0x0000555556f0e2c0 in alloc::vec::Vec<T,A>::extend_with (self=0x7ffedb1efc18, n=8726675783204887973, value=0)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/vec/mod.rs:2694
#14 0x0000555556f0e783 in alloc::vec::Vec<T,A>::resize (self=0x7ffedb1efc18, new_len=8726675783204887973, value=0)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/vec/mod.rs:2578
#15 0x00005555561bba0b in bincode::de::read::IoReader<R>::fill_buffer (self=0x7ffedb1efc18, length=8726675783204887973)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/de/read.rs:144
#16 0x00005555561bc1d6 in <bincode::de::read::IoReader<R> as bincode::de::read::BincodeRead>::get_byte_buffer (self=0x7ffedb1efc18, 
    length=8726675783204887973) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/de/read.rs:171
#17 0x00005555561b5921 in bincode::de::Deserializer<R,O>::read_vec (self=0x7ffedb1efc18)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/de/mod.rs:96
#18 0x00005555561b51cb in bincode::de::Deserializer<R,O>::read_string (self=0x7ffedb1efc18)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/de/mod.rs:100
#19 0x00005555561b669d in <&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_string (self=0x7ffedb1efc18, 
    visitor=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/de/mod.rs:244
#20 0x0000555556128227 in serde::de::impls::<impl serde::de::Deserialize for alloc::string::String>::deserialize (deserializer=0x7ffedb1efc18)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.215/src/de/impls.rs:704
#21 0x000055555606b8e6 in <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize (deserializer=0x7ffedb1efc18)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.215/src/de/mod.rs:800
#22 0x00005555560e7db9 in bincode::internal::deserialize_from_custom_seed (seed=..., reader=..., options=...)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/internal.rs:88
--Type <RET> for more, q to quit, c to continue without paging--
#23 0x00005555560e7c80 in bincode::internal::deserialize_from_seed (seed=..., reader=..., options=...)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/internal.rs:65
#24 0x00005555560e79fc in bincode::internal::deserialize_from (reader=..., options=...)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/internal.rs:55
#25 0x0000555556169f2b in bincode::config::Options::deserialize_from (reader=..., self=...)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/config/mod.rs:229
#26 bincode::deserialize_from (reader=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/bincode-1.3.3/src/lib.rs:129
#27 0x00005555560bf47f in foyer_storage::serde::EntryDeserializer::deserialize_key (buf=...)
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/foyer-storage-0.12.2/src/serde.rs:208
#28 0x00005555561a7e6d in foyer_storage::large::scanner::RegionScanner::next_key::{{closure}} ()
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/foyer-storage-0.12.2/src/large/scanner.rs:179
#29 0x0000555556139037 in foyer_storage::large::reclaimer::ReclaimRunner<K,V,S>::handle::{{closure}} ()
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/foyer-storage-0.12.2/src/large/reclaimer.rs:185
#30 0x0000555556134a7b in foyer_storage::large::reclaimer::ReclaimRunner<K,V,S>::run::{{closure}} ()
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/foyer-storage-0.12.2/src/large/reclaimer.rs:150
#31 0x000055555613e5d0 in foyer_storage::large::reclaimer::Reclaimer::open::{{closure}} ()
    at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/foyer-storage-0.12.2/src/large/reclaimer.rs:79

After we investigated this problem, we found that the leading bytes of deserialize reader buffer is removed, there is 64 bytes buffer, but after deserialize_from_custom_seed is called, the 8 bytes at the beginging of the reader buffer are cut off, which made a large reader buffer size such as deserialize_from_custom_seed.

is there a limit for the single cache file size of foyer?

hopkings2008 avatar Nov 29 '24 03:11 hopkings2008

the foyer we used is "0.12.2"

hopkings2008 avatar Nov 29 '24 07:11 hopkings2008

Hi, @hopkings2008 . Thanks for reporting. The sracktrace indicated that there was an OOM. Would you like to share your configutation of your node instance and foyer? Let me check if there is something wrong on the foyet side. 🙏

MrCroxx avatar Nov 29 '24 07:11 MrCroxx

And, would you please also ahare what is the largest entry size in your workload? 10 GiB per cache file looks too large, each eviction op would invalidate 10% of the totol cache capacity.

If there is not that large entrirs in your workload, a cache file size from 64 MiB would be enough.

MrCroxx avatar Nov 29 '24 07:11 MrCroxx

Hi, @hopkings2008 . Thanks for reporting. The sracktrace indicated that there was an OOM. Would you like to share your configutation of your node instance and foyer? Let me check if there is something wrong on the foyet side. 🙏

Hi MrCroxx, thank you for your quickly response, and below is our detail config:

let cache_result = exec.get_runtime().block_on(
            HybridCacheBuilder::new()
                .memory(1)
                .with_shards(16)
                .with_eviction_config(LruConfig::default())
                .with_object_pool_capacity(1024)
                .with_hash_builder(ahash::RandomState::default())
                .storage(Engine::Mixed(0.1))
                .with_device_options(
                    DirectFsDeviceOptions::new("/tmp/cache_server")
                        .with_capacity(102400 * 1024 * 1024)
                        .with_file_size(10240 * 1024 * 1024),
                )
                .with_flush(true)
                .with_recover_mode(RecoverMode::None)
                .with_admission_picker(Arc::new(RateLimitPicker::new(100 * 1024 * 1024)))
                .with_compression(None)
                .with_runtime_options(RuntimeOptions::Separated {
                    read_runtime_options: TokioRuntimeOptions {
                        worker_threads: 8,
                        max_blocking_threads: 16,
                    },
                    write_runtime_options: TokioRuntimeOptions {
                        worker_threads: 8,
                        max_blocking_threads: 16,
                    },
                })
                .with_large_object_disk_cache_options(
                    LargeEngineOptions::new()
                        .with_indexer_shards(64)
                        .with_recover_concurrency(8)
                        .with_flushers(2)
                        .with_reclaimers(2)
                        .with_buffer_pool_size(256 * 1024 * 1024)
                        .with_clean_region_threshold(4)
                        .with_eviction_pickers(vec![Box::<FifoPicker>::default()])
                        .with_reinsertion_picker(Arc::new(RateLimitPicker::new(10 * 1024 * 1024))),
                )
                .with_small_object_disk_cache_options(
                    SmallEngineOptions::new()
                        .with_set_size(16 * 1024)
                        .with_set_cache_capacity(64)
                        .with_flushers(2),
                )
                .build(),
        );

hopkings2008 avatar Nov 29 '24 07:11 hopkings2008

And, would you please also ahare what is the largest entry size in your workload? 10 GiB per cache file looks too large, each eviction op would invalidate 10% of the totol cache capacity.

If there is not that large entrirs in your workload, a cache file size from 64 MiB would be enough.

the largest entry size in our workload is about 4MiB, and we tested to use 64MiB as the single file size before, it works fine. And what are the recommends for the single file size? if our largest workload is 4MiB, what is the best single file size in our case? And thanks very much

hopkings2008 avatar Nov 29 '24 07:11 hopkings2008

Hi, @hopkings2008 .

the largest entry size in our workload is about 4MiB, and we tested to use 64MiB as the single file size before, it works fine. And what are the recommends for the single file size? if our largest workload is 4MiB, what is the best single file size in our case? And thanks very much

For your workload, I think 64 MiB for each file is enough and would work better than 10 GiB setup. Because evict disk cache by region (file with fs). 10 GiB in 100 GiB means each time foyer will evict 10% of the data.

MrCroxx avatar Dec 02 '24 09:12 MrCroxx

Hi, @hopkings2008 .

the largest entry size in our workload is about 4MiB, and we tested to use 64MiB as the single file size before, it works fine. And what are the recommends for the single file size? if our largest workload is 4MiB, what is the best single file size in our case? And thanks very much

For your workload, I think 64 MiB for each file is enough and would work better than 10 GiB setup. Because evict disk cache by region (file with fs). 10 GiB in 100 GiB means each time foyer will evict 10% of the data.

Does it mean that if the single file size is large, the crash will happen? and what is the root cause of that crash i uploaded before?

hopkings2008 avatar Dec 02 '24 09:12 hopkings2008

Does it mean that if the single file size is large, the crash will happen? and what is the root cause of that crash i uploaded before?

The OOM is unexpected. I'm investigating it. 🙌

MrCroxx avatar Dec 03 '24 03:12 MrCroxx