openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

fix: prevent caching carousel errors when Solr fails

Open akramcodez opened this issue 1 month ago • 4 comments

Fixes #11373

Problem

When Solr fails, carousel error responses were cached for 5 minutes, causing all users to see empty carousels even after Solr recovered.

Solution

Add conditional caching logic to detect and skip caching error responses. Failed requests now trigger fresh retries instead of serving cached errors.


Files Changed

openlibrary/core/cache.py

Added optional cacheable parameter to memcache_memoize:

  • Callback function that determines if a value should be cached
  • Returns True to cache, False to skip
  • Wrapped in try/except for fail-safe behavior

openlibrary/plugins/upstream/utils.py

Added carousel validation logic:

  • is_cacheable_carousel_content() - checks rendered HTML for error indicators, missing carousel structure, or empty content
  • is_cacheable_macro() - applies carousel validation only to carousel macros, default caching for others
  • Connected validation to memcache_memoize via cacheable parameter

openlibrary/plugins/worksearch/code.py

Added Solr error handling:

  • Check for response.error or response.raw_resp is None before processing
  • Return structured error dict instead of crashing with TypeError
  • Log errors for monitoring

openlibrary/tests/core/test_cache.py

Added comprehensive tests:

  • test_cacheable_callback() - verifies normal values cached, errors not cached
  • test_cacheable_callback_edge_cases() - tests None, empty, partial data handling
  • test_cacheable_callback_exception_handling() - verifies fail-safe behavior

Testing

pytest openlibrary/tests/core/test_cache.py -v

All tests pass. No breaking changes to existing functionality.


Checklist

  • [x] Fixes reported issue
  • [x] Prevents caching on errors
  • [x] Adds tests to verify behavior
  • [x] No breaking changes
  • [x] Fail-safe error handling

Stakeholders

@jimchamp

akramcodez avatar Dec 05 '25 06:12 akramcodez

PTAL @jimchamp, I’ve tried my best to fix the issue. Let me know if anything else needs attention.

akramcodez avatar Dec 05 '25 06:12 akramcodez

The first step to solving #11373 is to reproduce the bug. Please explain how you reproduced it.

jimchamp avatar Dec 05 '25 22:12 jimchamp

@jimchamp I reproduced the bug by targeting /partials.json (the actual carousel endpoint that internally calls render_cached_macro(), not the homepage HTML which only contains placeholders).

Reproduction: Stop Solr → Clear cache → Fetch /partials.json?macro=ReadingLogCarousel → Empty response gets cached via render_cached_macro() → Restart Solr → Bug: Empty carousel persists for 5 minutes despite Solr being healthy.

Local verification: I created reproduction scripts (reproduce_issue.py and test_carousel_cache_bug.sh) that demonstrate the issue. The scripts confirm that render_cached_macro() caches the error response, aligning with the issue description.

Production impact: When Solr experiences issues, all users see broken carousels even after recovery. My fix prevents caching error responses by adding conditional caching logic to memcache_memoize() and validating carousel content in render_cached_macro().

The reproduction scripts are local testing tools and not included in the PR. Would you like me to share them for verification, or would you prefer to review the fix implementation directly?

akramcodez avatar Dec 06 '25 06:12 akramcodez

If my reproduction approach isn't good, please let me know.

akramcodez avatar Dec 06 '25 06:12 akramcodez

The issue here is that our caching system is currently undergoing a major refactor. These changes will likely conflict with that effort.

This can be done without modifying cache.py. I've added some notes about this here.

We ask that contributors request to be assigned to issues before working on them. This gives contributors an opportunity to ask questions about issue before writing code that may not solve the issue in a satisfactory way. Please start doing this before working on future issues.

jimchamp avatar Dec 18 '25 22:12 jimchamp