Add ETag support for spectra list caching
Background
The spectra list endpoint (/graphql/ with SpectraList query and /api/proteins/spectraslugs/) returns ~1.3 MB of data (compressed to 140 KB with gzip). This data is relatively stable - it may not change for days or weeks at a time.
Current state:
- ✅ Server-side Redis caching (1 hour)
- ✅ Client-side React Query caching (10 minutes in memory)
- ✅ HTTP
Cache-Control: public, max-age=600(10 minutes browser cache) - ❌ No ETags - browser re-downloads 140 KB every 10 minutes even if data hasn't changed
The Problem
Even with current caching:
- After 10 minutes, browser cache expires
- User refreshes page → Downloads 140 KB again
- Even if the data hasn't changed for days!
Impact:
- Wasted bandwidth: ~840 MB/day per active user
- Unnecessary server load: Full GraphQL query + Redis lookup every 10 min
- Poor mobile experience: Large downloads on cellular
The Solution: ETags (Entity Tags)
ETags enable conditional requests - the browser can ask "has this changed?" and get a tiny response if it hasn't.
How It Works
```http
First request
GET /api/proteins/spectraslugs/ → 200 OK ETag: "v123" Cache-Control: public, max-age=600 [140 KB payload]
After cache expires (10+ minutes later)
GET /api/proteins/spectraslugs/ If-None-Match: "v123" → 304 Not Modified ETag: "v123" [NO PAYLOAD - just headers, < 1 KB!]
Browser uses cached 140 KB ✓
After data actually changes
GET /api/proteins/spectraslugs/ If-None-Match: "v123" → 200 OK ETag: "v124" ← New version! [140 KB new payload] ```
Timeline Example
``` Day 1, 9:00am: User loads /spectra/ → 140 KB download, ETag: "123" Day 1, 9:05am: Refresh → Browser cache (0 KB) Day 1, 9:15am: Refresh → 304 validation (< 1 KB) Day 2, 3:00pm: Return → 304 validation (< 1 KB) Day 7: Return → 304 validation (< 1 KB) Day 14: Admin adds spectrum → Version "124" Day 14, 2pm: User refresh → 200 with new data (140 KB) ✓ ```
Bandwidth savings: 140 KB initial + ~1 KB validations = 99.7% reduction for stable data
Implementation Options
Option A: ETag for GraphQL SpectraList query
Location: `backend/fpbase/views.py` - `RateLimitedGraphQLView`
Approach: ```python def dispatch(self, request, *args, **kwargs): # Only for SpectraList query if b'SpectraList' in request.body: version = cache.get('spectra_sluglist_version', 0) etag = f'"{version}"'
# Check if client has current version
if request.META.get('HTTP_IF_NONE_MATCH') == etag:
response = HttpResponse(status=304)
response['ETag'] = etag
response['Cache-Control'] = 'public, max-age=600'
return response
response = super().dispatch(request, *args, **kwargs)
if response.status_code == 200 and b'SpectraList' in request.body:
response['ETag'] = etag
return response
```
Pros:
- Works with current GraphQL setup
- Frontend doesn't need changes
Cons:
- Need to parse GraphQL request body
- Mixing concerns in GraphQL view
Option B: ETag for REST endpoint
Location: `backend/proteins/api/views.py` - `spectraslugs` view
Approach: ```python @require_http_methods(["GET", "HEAD"]) def spectraslugs(request): version = cache.get('spectra_sluglist_version', 0) etag = f'"{version}"'
# Check If-None-Match header
if request.META.get('HTTP_IF_NONE_MATCH') == etag:
response = HttpResponse(status=304)
response['ETag'] = etag
response['Cache-Control'] = 'public, max-age=600'
return response
spectrainfo = get_cached_spectra_info()
response = HttpResponse(spectrainfo, content_type="application/json")
response['ETag'] = etag
response['Cache-Control'] = 'public, max-age=600'
return response
```
Pros:
- Simpler (one endpoint = one resource)
- REST endpoint already has caching patterns
- Easier to test
Cons:
- Frontend might need to switch from GraphQL to REST for this query
Option C: Both
Support ETags on both endpoints for flexibility.
Version Tracking
Already have infrastructure: ```python
backend/proteins/models/spectrum.py:424
def save(self, *args, **kwargs): cache.delete(SPECTRA_CACHE_KEY) # ← Already invalidates on change super().save(*args, **kwargs) ```
Need to add: ```python SPECTRA_VERSION_KEY = "spectra_sluglist_version"
def save(self, *args, **kwargs): cache.delete(SPECTRA_CACHE_KEY) cache.incr(SPECTRA_VERSION_KEY, delta=1) # ← Increment version super().save(*args, **kwargs) ```
And in cache population: ```python def get_cached_spectra_info(timeout=60 * 60): spectrainfo = cache.get(SPECTRA_CACHE_KEY) if not spectrainfo: # Ensure version exists and increment try: cache.incr(SPECTRA_VERSION_KEY, delta=1) except ValueError: # Key doesn't exist cache.set(SPECTRA_VERSION_KEY, 1, None) # Never expire
spectrainfo = json.dumps({"data": {"spectra": Spectrum.objects.sluglist()}})
cache.set(SPECTRA_CACHE_KEY, spectrainfo, timeout)
return spectrainfo
```
Expected Impact
Bandwidth:
- Current: ~840 MB/day per user (140 KB × 144 page loads)
- With ETags: ~212 KB/day per user (99.7% reduction!)
Server Load:
- Current: Full GraphQL query + Redis lookup every 10 min
- With ETags: Version number check (microseconds) → 304 response
User Experience:
- Faster page loads (304 response < 10ms vs 140 KB download)
- Better mobile experience (minimal cellular data usage)
- Always get latest data when it changes
Testing Checklist
- [ ] ETag generated correctly from version number
- [ ] 304 response when If-None-Match matches
- [ ] 200 response with new ETag when version changes
- [ ] Version increments when Spectrum is saved
- [ ] Version increments when cache is rebuilt
- [ ] Works across browser refreshes
- [ ] Works in different tabs
- [ ] Works after days/weeks
- [ ] GZip compression still works with ETags
- [ ] Vary header includes Accept-Encoding
References
Related
- #360 (feat: enable gzip - implemented)
- FPBASE-5ZP (Large HTTP payload - resolved by gzip)