FPbase icon indicating copy to clipboard operation
FPbase copied to clipboard

Add ETag support for spectra list caching

Open tlambert03 opened this issue 4 months ago • 0 comments

Background

The spectra list endpoint (/graphql/ with SpectraList query and /api/proteins/spectraslugs/) returns ~1.3 MB of data (compressed to 140 KB with gzip). This data is relatively stable - it may not change for days or weeks at a time.

Current state:

  • ✅ Server-side Redis caching (1 hour)
  • ✅ Client-side React Query caching (10 minutes in memory)
  • ✅ HTTP Cache-Control: public, max-age=600 (10 minutes browser cache)
  • ❌ No ETags - browser re-downloads 140 KB every 10 minutes even if data hasn't changed

The Problem

Even with current caching:

  • After 10 minutes, browser cache expires
  • User refreshes page → Downloads 140 KB again
  • Even if the data hasn't changed for days!

Impact:

  • Wasted bandwidth: ~840 MB/day per active user
  • Unnecessary server load: Full GraphQL query + Redis lookup every 10 min
  • Poor mobile experience: Large downloads on cellular

The Solution: ETags (Entity Tags)

ETags enable conditional requests - the browser can ask "has this changed?" and get a tiny response if it hasn't.

How It Works

```http

First request

GET /api/proteins/spectraslugs/ → 200 OK ETag: "v123" Cache-Control: public, max-age=600 [140 KB payload]

After cache expires (10+ minutes later)

GET /api/proteins/spectraslugs/ If-None-Match: "v123" → 304 Not Modified ETag: "v123" [NO PAYLOAD - just headers, < 1 KB!]

Browser uses cached 140 KB ✓

After data actually changes

GET /api/proteins/spectraslugs/ If-None-Match: "v123" → 200 OK ETag: "v124" ← New version! [140 KB new payload] ```

Timeline Example

``` Day 1, 9:00am: User loads /spectra/ → 140 KB download, ETag: "123" Day 1, 9:05am: Refresh → Browser cache (0 KB) Day 1, 9:15am: Refresh → 304 validation (< 1 KB) Day 2, 3:00pm: Return → 304 validation (< 1 KB) Day 7: Return → 304 validation (< 1 KB) Day 14: Admin adds spectrum → Version "124" Day 14, 2pm: User refresh → 200 with new data (140 KB) ✓ ```

Bandwidth savings: 140 KB initial + ~1 KB validations = 99.7% reduction for stable data

Implementation Options

Option A: ETag for GraphQL SpectraList query

Location: `backend/fpbase/views.py` - `RateLimitedGraphQLView`

Approach: ```python def dispatch(self, request, *args, **kwargs): # Only for SpectraList query if b'SpectraList' in request.body: version = cache.get('spectra_sluglist_version', 0) etag = f'"{version}"'

    # Check if client has current version
    if request.META.get('HTTP_IF_NONE_MATCH') == etag:
        response = HttpResponse(status=304)
        response['ETag'] = etag
        response['Cache-Control'] = 'public, max-age=600'
        return response

response = super().dispatch(request, *args, **kwargs)

if response.status_code == 200 and b'SpectraList' in request.body:
    response['ETag'] = etag

return response

```

Pros:

  • Works with current GraphQL setup
  • Frontend doesn't need changes

Cons:

  • Need to parse GraphQL request body
  • Mixing concerns in GraphQL view

Option B: ETag for REST endpoint

Location: `backend/proteins/api/views.py` - `spectraslugs` view

Approach: ```python @require_http_methods(["GET", "HEAD"]) def spectraslugs(request): version = cache.get('spectra_sluglist_version', 0) etag = f'"{version}"'

# Check If-None-Match header
if request.META.get('HTTP_IF_NONE_MATCH') == etag:
    response = HttpResponse(status=304)
    response['ETag'] = etag
    response['Cache-Control'] = 'public, max-age=600'
    return response

spectrainfo = get_cached_spectra_info()
response = HttpResponse(spectrainfo, content_type="application/json")
response['ETag'] = etag
response['Cache-Control'] = 'public, max-age=600'
return response

```

Pros:

  • Simpler (one endpoint = one resource)
  • REST endpoint already has caching patterns
  • Easier to test

Cons:

  • Frontend might need to switch from GraphQL to REST for this query

Option C: Both

Support ETags on both endpoints for flexibility.

Version Tracking

Already have infrastructure: ```python

backend/proteins/models/spectrum.py:424

def save(self, *args, **kwargs): cache.delete(SPECTRA_CACHE_KEY) # ← Already invalidates on change super().save(*args, **kwargs) ```

Need to add: ```python SPECTRA_VERSION_KEY = "spectra_sluglist_version"

def save(self, *args, **kwargs): cache.delete(SPECTRA_CACHE_KEY) cache.incr(SPECTRA_VERSION_KEY, delta=1) # ← Increment version super().save(*args, **kwargs) ```

And in cache population: ```python def get_cached_spectra_info(timeout=60 * 60): spectrainfo = cache.get(SPECTRA_CACHE_KEY) if not spectrainfo: # Ensure version exists and increment try: cache.incr(SPECTRA_VERSION_KEY, delta=1) except ValueError: # Key doesn't exist cache.set(SPECTRA_VERSION_KEY, 1, None) # Never expire

    spectrainfo = json.dumps({"data": {"spectra": Spectrum.objects.sluglist()}})
    cache.set(SPECTRA_CACHE_KEY, spectrainfo, timeout)
return spectrainfo

```

Expected Impact

Bandwidth:

  • Current: ~840 MB/day per user (140 KB × 144 page loads)
  • With ETags: ~212 KB/day per user (99.7% reduction!)

Server Load:

  • Current: Full GraphQL query + Redis lookup every 10 min
  • With ETags: Version number check (microseconds) → 304 response

User Experience:

  • Faster page loads (304 response < 10ms vs 140 KB download)
  • Better mobile experience (minimal cellular data usage)
  • Always get latest data when it changes

Testing Checklist

  • [ ] ETag generated correctly from version number
  • [ ] 304 response when If-None-Match matches
  • [ ] 200 response with new ETag when version changes
  • [ ] Version increments when Spectrum is saved
  • [ ] Version increments when cache is rebuilt
  • [ ] Works across browser refreshes
  • [ ] Works in different tabs
  • [ ] Works after days/weeks
  • [ ] GZip compression still works with ETags
  • [ ] Vary header includes Accept-Encoding

References

Related

  • #360 (feat: enable gzip - implemented)
  • FPBASE-5ZP (Large HTTP payload - resolved by gzip)

tlambert03 avatar Nov 07 '25 22:11 tlambert03