rattler
rattler copied to clipboard
[BUG] - py-rattler `cache_control` parameter for reliable caching is not working as expected
Checklist
- [x] I added a descriptive title
- [x] I searched open reports and couldn't find a duplicate
What happened?
rattler caches the repodata.json for the solve operation. Whether or not the repodata.json is fetched again relies on three parameters from the HTTP Request headers. Those three are,
- etag
- cache_control
- last_modified
Out of these three headers the cache_control functionality is not working as expected. I observed this behavior while trying to run the solve operation. Here is the sequence of events,
- I run the solve process and save the lockfile
- I pushed a new package to my private conda channel, let's call it packageA
- I ran the solve again and save the lockfile
Rattlerstill uses the old version of the package. It remains the same in lockfile1 vs lockfile2
The HTTP Response received from the private conda channel contains the field cache_control with maxAge = 30s. However, rattler is still used the cache version of repodata.json to run the solve. This behavior is unexpected and needs to be updated.
Additional Context
The HTTP response received from the private conda repository
$ curl -I https://xxx/:<REDACTED_TOKEN>@conda.registry.XXX/linux-64/repodata.json
HTTP/1.1 200 Connection established
HTTP/2 200
content-type: application/json
content-length: 112290
date: Fri, <REDACTED> May <REDACTED>
allow: GET, HEAD, OPTIONS
cache-control: max-age=30, must-revalidate
x-ratelimit-interval: 0
x-ratelimit-limit: 1512000
x-ratelimit-remaining: 1511999
x-ratelimit-reset: 1746212526
referrer-policy: same-origin
cross-origin-opener-policy: same-origin
server: <REDACTED>
x-frame-options: DENY
x-content-type-options: nosniff
x-xss-protection: 0
vary: Accept-Encoding,Cookie
x-cache: <REDACTED>
x-amz-cf-pop: <REDACTED>
alt-svc: h3=":443"; ma=86400
x-amz-cf-id: <REDACTED>
Can you check the cache directory? it should contain the cached repodata as well as the metadata used to fetch it.
Or are you not shutting down the process in between? The Gateway caches the repodata in memory indefinite but there is a method to clear specific caches.
Hi @baszalmstra Thanks for the suggestion! Yes, you got that right. We are not shutting down the process in between. Rattler is run as a part of a micro-service and the service keeps running.
We made use of per_channel_config within the Gateway constructor as a workaround to discard the cache for our private conda channel - using the no-cache option for a specific channel. However, I wanted to take this opportunity to create the bug and potentially contribute a fix. I want to push a fix for this as well and I am working on it now.
@wolfv Would you mind assigning this bug to me?
I assigned you @gauraangkhurana - let us know if you have questions :)
Another option is to use this method: https://conda.github.io/rattler/py-rattler/gateway/#rattler.repo_data.gateway.Gateway.clear_repodata_cache