change default from 64k to 2 or 4 mb
Is your feature request related to a problem? Please describe. Currently the block size for cache file default to 65k which i think in production is too small. We have large number of files which are in GB and you can see the number of files in the directory grows very fast. This also start contributing to random seek for a very small block. We did a quick benchmark and found that 2-4 mb work best for large deployments. I quick search on AWS also recommend block size 2-8 mb
Describe the solution you'd like A clear and concise description of what you want to happen. Change the default parameter. If you want i can submit a quick pr
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. We have tested on S3 with few block size and 2-4 mb seems to be working best with performance and reduction in number of files
Additional context Add any other context or screenshots about the feature request here.
Hi @tanejagagan I sincerely apologize for the slow response, and thank you for the interest and feedback!
There're a few things here:
- From my past experience, I agree 64KiB is too small, and 2MiB is the sweetspot for object storage;
- From rollout's perspective, it's generally not safe to bump up default value too much (64KiB to 2MiB is 32 times larger), which might affect latency for small requests.
- Do you mind if we change it in two times? For example, first PR from 64KiB to 512KiB, second one from 512KiB to 2MiB, if we don't observe other user complaints? Just want to reduce surprises. :)
- To solve your problem right away, could you please try config
cache_httpfs_cache_block_size? Most of the params for the extension are easily configurable.
Change the default parameter. If you want i can submit a quick pr
Sure! Contributions are much appreciated! Default IO request size could be found here: https://github.com/dentiny/duck-read-cache-fs/blob/406236bfec3ec70e8464e10153416e9a03a4f499/src/include/cache_filesystem_config.hpp#L36
FYI:
- Feel free to reach out to me by [email protected]
- If you feel comfortable, you could also make a PR to add a "user" section