rattler icon indicating copy to clipboard operation
rattler copied to clipboard

[BUG] - py-rattler `cache_control` parameter for reliable caching is not working as expected

Open gauraangkhurana opened this issue 5 months ago • 5 comments

Checklist

  • [x] I added a descriptive title
  • [x] I searched open reports and couldn't find a duplicate

What happened?

rattler caches the repodata.json for the solve operation. Whether or not the repodata.json is fetched again relies on three parameters from the HTTP Request headers. Those three are,

  1. etag
  2. cache_control
  3. last_modified

Out of these three headers the cache_control functionality is not working as expected. I observed this behavior while trying to run the solve operation. Here is the sequence of events,

  • I run the solve process and save the lockfile
  • I pushed a new package to my private conda channel, let's call it packageA
  • I ran the solve again and save the lockfile
  • Rattler still uses the old version of the package. It remains the same in lockfile1 vs lockfile2

The HTTP Response received from the private conda channel contains the field cache_control with maxAge = 30s. However, rattler is still used the cache version of repodata.json to run the solve. This behavior is unexpected and needs to be updated.

Additional Context

The HTTP response received from the private conda repository

$ curl -I https://xxx/:<REDACTED_TOKEN>@conda.registry.XXX/linux-64/repodata.json

HTTP/1.1 200 Connection established

HTTP/2 200
content-type: application/json
content-length: 112290
date: Fri, <REDACTED> May <REDACTED>
allow: GET, HEAD, OPTIONS
cache-control: max-age=30, must-revalidate
x-ratelimit-interval: 0
x-ratelimit-limit: 1512000
x-ratelimit-remaining: 1511999
x-ratelimit-reset: 1746212526
referrer-policy: same-origin
cross-origin-opener-policy: same-origin
server: <REDACTED>
x-frame-options: DENY
x-content-type-options: nosniff
x-xss-protection: 0
vary: Accept-Encoding,Cookie
x-cache: <REDACTED>
x-amz-cf-pop: <REDACTED>
alt-svc: h3=":443"; ma=86400
x-amz-cf-id: <REDACTED>

gauraangkhurana avatar May 31 '25 17:05 gauraangkhurana

Can you check the cache directory? it should contain the cached repodata as well as the metadata used to fetch it.

Or are you not shutting down the process in between? The Gateway caches the repodata in memory indefinite but there is a method to clear specific caches.

baszalmstra avatar May 31 '25 17:05 baszalmstra

Hi @baszalmstra Thanks for the suggestion! Yes, you got that right. We are not shutting down the process in between. Rattler is run as a part of a micro-service and the service keeps running.

We made use of per_channel_config within the Gateway constructor as a workaround to discard the cache for our private conda channel - using the no-cache option for a specific channel. However, I wanted to take this opportunity to create the bug and potentially contribute a fix. I want to push a fix for this as well and I am working on it now.

gauraangkhurana avatar Jun 01 '25 17:06 gauraangkhurana

@wolfv Would you mind assigning this bug to me?

gauraangkhurana avatar Jun 01 '25 17:06 gauraangkhurana

I assigned you @gauraangkhurana - let us know if you have questions :)

wolfv avatar Jun 01 '25 17:06 wolfv

Another option is to use this method: https://conda.github.io/rattler/py-rattler/gateway/#rattler.repo_data.gateway.Gateway.clear_repodata_cache

baszalmstra avatar Jun 01 '25 18:06 baszalmstra