[Feature] - Cache backend files/blobs/objects
Describe the solution you'd like
Would you consider caching the backend objects in the gateway? This will benefit repeated read scenarios a lot. Another use case is to warm-up the cache in advance if user knows what to read next, so that user can get a better read performance.
The main issue with read cache is when load balancing multiple gateways. If a cached object gets overwritten through a different gateway, there is no way to actively invalidate the cache on the other gateway. Maybe this could be an option for single gateway use cases though.
Right, overwriting scenarios will cause the inconsistency issue. How about making the cache feature optional? If the user knows the data is immutable, they can enable this feature. I suppose most users using object storage donot overwrite their objects frequently - this will be a appealing feature for them.
One possibility would be to save the etag of the file in the cache. S3 as well as Azure Blob will return a status 304 if the object has not been changed. This would validate the content, but the download of the file would be ignored, as it can be delivered from the cache. This would massively reduce the response and outgoing traffic.
It might also be possible to make it optionally configurable that the re-validation with ETag only takes place after a certain time in order to reduce the GetRequests to the cloud services.
This etag can be saved in xattr(extended attributes) of the cached file, or can be calculated dynamically when etag xattr is not found.