Add runtime config options for `list_files_cache_limit` and `list_files_cache_ttl`
Which issue does this PR close?
- Closes #19056
Rationale for this change
Make the list file cache memory limit and TTL configurable via runtime config.
What changes are included in this PR?
- Add ability to
SETandRESETlist_files_cache_limitandlist_files_cache_ttl list_files_cache_ttlwill expect the duration to look like either1m30sor30(I'm wondering if it would be simpler for it to just accept a single unit?)- Add
update_cache_ttl()to theListFilesCachetrait so we can update it fromRuntimeEnvBuilder::build() - Add config entries
Are these changes tested?
Yes
Are there any user-facing changes?
update_cache_ttl() has been added to the ListFilesCache trait
Hi @BlakeOrth @alamb, this is ready for review
@BlakeOrth Yep, you can run reset datafusion.runtime.list_files_cache_ttl to set it back to None
Thanks @delamarch3 and @BlakeOrth -- I'll try and check this out soon
This confused me for a while but it makes sense to me and will be fixed with
Ah sorry, should have been more clear, I meant in the datafusion-cli. I configured it like this to test it out:
diff --git a/datafusion-cli/src/main.rs b/datafusion-cli/src/main.rs
index de666fced..abed30ea3 100644
--- a/datafusion-cli/src/main.rs
+++ b/datafusion-cli/src/main.rs
@@ -23,6 +23,8 @@ use std::process::ExitCode;
use std::sync::{Arc, LazyLock};
use datafusion::error::{DataFusionError, Result};
+use datafusion::execution::cache::cache_manager::CacheManagerConfig;
+use datafusion::execution::cache::DefaultListFilesCache;
use datafusion::execution::context::SessionConfig;
use datafusion::execution::memory_pool::{
FairSpillPool, GreedyMemoryPool, MemoryPool, TrackConsumersPool,
@@ -222,6 +224,11 @@ async fn main_inner() -> Result<()> {
);
rt_builder = rt_builder.with_object_store_registry(instrumented_registry.clone());
+ rt_builder = rt_builder.with_cache_manager(
+ CacheManagerConfig::default()
+ .with_list_files_cache(Some(Arc::new(DefaultListFilesCache::default()))),
+ );
+
let runtime_env = rt_builder.build_arc()?;
// enable dynamic file query
This confused me for a while but it makes sense to me and will be fixed with
Ah sorry, should have been more clear, I meant in the datafusion-cli. I configured it like this to test it out:
diff --git a/datafusion-cli/src/main.rs b/datafusion-cli/src/main.rs index de666fced..abed30ea3 100644 --- a/datafusion-cli/src/main.rs +++ b/datafusion-cli/src/main.rs @@ -23,6 +23,8 @@ use std::process::ExitCode; use std::sync::{Arc, LazyLock}; use datafusion::error::{DataFusionError, Result}; +use datafusion::execution::cache::cache_manager::CacheManagerConfig; +use datafusion::execution::cache::DefaultListFilesCache; use datafusion::execution::context::SessionConfig; use datafusion::execution::memory_pool::{ FairSpillPool, GreedyMemoryPool, MemoryPool, TrackConsumersPool, @@ -222,6 +224,11 @@ async fn main_inner() -> Result<()> { ); rt_builder = rt_builder.with_object_store_registry(instrumented_registry.clone()); + rt_builder = rt_builder.with_cache_manager( + CacheManagerConfig::default() + .with_list_files_cache(Some(Arc::new(DefaultListFilesCache::default()))), + ); + let runtime_env = rt_builder.build_arc()?; // enable dynamic file query
No worries -- I think it is a temporary situation so we should be good to go
Thanks @delamarch3 and @BlakeOrth