duck-read-cache-fs icon indicating copy to clipboard operation
duck-read-cache-fs copied to clipboard

Respect Cache-Control Headers

Open rustyconover opened this issue 1 month ago • 3 comments

Does this caching extension respect HTTP cache control headers and perform re-validation? If not would you be open to a PR that makes it do so?

For details about cache control headers:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Cache-Control

Thanks,

Rusty

rustyconover avatar Nov 07 '25 15:11 rustyconover

Yes, this is something I've considered but didn't implement: https://github.com/dentiny/duck-read-cache-fs/issues/257

  • I'm aware duckdb external file cache does validation by default, and provides internal option to skip
  • I understand it's beneficial, but didn't treat it as high-priority based on my own experience
    • I rarely see people overwriting the same object with different content
    • I do want to propose duckdb exposing an extra params to allow disable validation
  • Implementation-wise, apart from validation before read, invalidation before/after write is also necessary

I will try to implement it this week -- the extension's features are mostly request-based, and my initial goal is only to provide read cache. I'm always happy take features!

dentiny avatar Nov 07 '25 15:11 dentiny

Hi @dentiny,

I'm also looking for it to support Windows. I think it can make Power BI work well with DuckDB querying remote files.

I agree rarely are files overwritten - but if its standards conforming its so much easier to not worry about things.

Rusty

rustyconover avatar Nov 07 '25 21:11 rustyconover

I agree rarely are files overwritten - but if its standards conforming its so much easier to not worry about things.

Sure! :)

I'm also looking for it to support Windows. I think it can make Power BI work well with DuckDB querying remote files.

I considered that as well, Mim also wants it. The main concern is my current implementation relies on filesystem behavior: https://github.com/dentiny/duck-read-cache-fs/blob/075e8f7c04bd2c4ea1cab2d655b0fee3826a37a5/src/disk_cache_reader.cpp#L132

  • Windows, from my limited knowledge, by default, doesn't allow file deletion when there's another thread/process access it
  • While for unix, delete a file is just a reference count decrement, and physical deletion when ref count drops to 0
  • A seemingly plausible way is FILE_SHARE_DELETE, as stated in the doc

The DeleteFile function fails if an application attempts to delete a file that has other handles open for normal I/O or as a memory-mapped file (FILE_SHARE_DELETE must have been specified when other handles were opened).

but that involves bigger change since the underlying std::remove doesn't really work in such case. https://en.cppreference.com/w/cpp/io/c/remove.html

A bigger concern is maintenance overhead: I had bad experience on windows when I was working on ray. Let me check with @douenergy and get back to you. :)

dentiny avatar Nov 07 '25 22:11 dentiny

Updates to this issue:

  • I add cache entry validation for both in-memory and on-disk cache, by default off
    • Currently the validation mimics the check with external file cache
  • I see two current blocker for windows support: one for MSVC compilation (will do before next minor version release), another for OS FS behavior; the later one has been completed here and should be released v1.5.0

dentiny avatar Dec 08 '25 08:12 dentiny