ETags: mismatch detection and Access-Control-Expose-Headers
If a dataset is updated on object storage, and a client has a directory cached, file offset/lengths will be invalid and the PMTiles client will fetch garbage for tiles.
We can detect this situation via ETag:
- do all storage platforms allow this tag to be exposed via CORS?
- how many retries should we allow? (assuming datasets change infrequently)
After a cursory amount of exploration:
Major S3-compatible providers (AWS, GCP, Azure) have a way to set CORS policy including exposing additional headers through primary settings interface (AWS UI, Azure UI, gsutil)
Smaller providers: don't have documented way to expose additional CORS headers: DigitalOcean, Scaleway, Backblaze), but should be doable via S3-compatible client and API
Conclusion is: PMTiles client should progressively support ETags - it shouldn't break if the tag is not exposed, but perhaps log a warning
We need to consider the side effects of caching the root directory + metadata as well.
If we are using etag-based expiration, a request for a tile using a stale directory will be detected correctly, since the tile ETag will be different than the stale cached one.
If we are only requesting the metadata for a stale directory, this will be served straight from cache (as intended). but there is no mechanism for invalidating this outside of requesting a tile.
implications:
- clients may need a way to manually cache bust
- clients may need a cache TTL setting
- implementations of storage could take an etag parameter to make the request with (get a 304 Not Modified), this is a very slim optimization however
Expires and Cache-Control headers are now passed through, which clients like MapLibre can take advantage of.
Metadata is not cached directly by the implementation.