Respect Cache-Control Headers
Does this caching extension respect HTTP cache control headers and perform re-validation? If not would you be open to a PR that makes it do so?
For details about cache control headers:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Cache-Control
Thanks,
Rusty
Yes, this is something I've considered but didn't implement: https://github.com/dentiny/duck-read-cache-fs/issues/257
- I'm aware duckdb external file cache does validation by default, and provides internal option to skip
- I understand it's beneficial, but didn't treat it as high-priority based on my own experience
- I rarely see people overwriting the same object with different content
- I do want to propose duckdb exposing an extra params to allow disable validation
- Implementation-wise, apart from validation before read, invalidation before/after write is also necessary
I will try to implement it this week -- the extension's features are mostly request-based, and my initial goal is only to provide read cache. I'm always happy take features!
Hi @dentiny,
I'm also looking for it to support Windows. I think it can make Power BI work well with DuckDB querying remote files.
I agree rarely are files overwritten - but if its standards conforming its so much easier to not worry about things.
Rusty
I agree rarely are files overwritten - but if its standards conforming its so much easier to not worry about things.
Sure! :)
I'm also looking for it to support Windows. I think it can make Power BI work well with DuckDB querying remote files.
I considered that as well, Mim also wants it. The main concern is my current implementation relies on filesystem behavior: https://github.com/dentiny/duck-read-cache-fs/blob/075e8f7c04bd2c4ea1cab2d655b0fee3826a37a5/src/disk_cache_reader.cpp#L132
- Windows, from my limited knowledge, by default, doesn't allow file deletion when there's another thread/process access it
- While for unix, delete a file is just a reference count decrement, and physical deletion when ref count drops to 0
- A seemingly plausible way is
FILE_SHARE_DELETE, as stated in the doc
The DeleteFile function fails if an application attempts to delete a file that has other handles open for normal I/O or as a memory-mapped file (FILE_SHARE_DELETE must have been specified when other handles were opened).
but that involves bigger change since the underlying std::remove doesn't really work in such case.
https://en.cppreference.com/w/cpp/io/c/remove.html
A bigger concern is maintenance overhead: I had bad experience on windows when I was working on ray. Let me check with @douenergy and get back to you. :)
Updates to this issue:
- I add cache entry validation for both in-memory and on-disk cache, by default off
- Currently the validation mimics the check with external file cache
- I see two current blocker for windows support: one for MSVC compilation (will do before next minor version release), another for OS FS behavior; the later one has been completed here and should be released v1.5.0