ocean icon indicating copy to clipboard operation
ocean copied to clipboard

[Core] Make cache_iterator_result preserve the chunks of items which were added

Open melodyogonna opened this issue 3 weeks ago • 4 comments

User description

Description

What - Cache_iterator_result should preserve cached items in the same chunks they're retrieved in.

Why - Some integrations have assumptions about how items are returned, and use this assumptions in other parts of the program (e.g to batch requests based on chunks of responses from an API).

How - Don't merge cache items together.

Type of change

Please leave one option from the following and delete the rest:

  • [ x ] Non-breaking change (fix of existing functionality that will not change current behavior)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • [ ] Integration able to create all default resources from scratch
  • [ ] Resync finishes successfully
  • [ ] Resync able to create entities
  • [ ] Resync able to update entities
  • [ ] Resync able to detect and delete entities
  • [ ] Scheduled resync able to abort existing resync and start a new one
  • [ ] Tested with at least 2 integrations from scratch
  • [ ] Tested with Kafka and Polling event listeners
  • [ ] Tested deletion of entities that don't pass the selector

Integration testing checklist

  • [ ] Integration able to create all default resources from scratch
  • [ ] Completed a full resync from a freshly installed integration and it completed successfully
  • [ ] Resync able to create entities
  • [ ] Resync able to update entities
  • [ ] Resync able to detect and delete entities
  • [ ] Resync finishes successfully
  • [ ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • [ ] If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • [ ] If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • [ ] Docs PR link here

Preflight checklist

  • [ ] Handled rate limiting
  • [ ] Handled pagination
  • [ ] Implemented the code in async
  • [ ] Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

API Documentation

Provide links to the API documentation used for this integration.


PR Type

Bug fix


Description

  • Preserve chunk structure in cache_iterator_result to maintain integration assumptions

  • Changed caching logic to store and yield individual chunks separately

  • Prevents merging of cached items that breaks downstream batch processing

  • Added test to verify chunk preservation across cache hits


Diagram Walkthrough

flowchart LR
  A["cache_iterator_result decorator"] -->|Before: extend| B["Merged flat list"]
  A -->|After: append| C["Preserved chunks"]
  C -->|On cache hit| D["Yield individual chunks"]
  B -->|On cache hit| E["Yield merged list"]

File Walkthrough

Relevant files
Bug fix
cache.py
Preserve chunks in cache storage and retrieval                     

port_ocean/utils/cache.py

  • Changed cached_results.extend(result) to cached_results.append(result)
    to preserve chunk boundaries
  • Modified cache retrieval to iterate and yield individual chunks
    instead of yielding entire cache as one item
  • Maintains chunk structure throughout caching lifecycle
+3/-2     
Tests
test_cache.py
Add test for chunk preservation in cache                                 

port_ocean/tests/utils/test_cache.py

  • Added new test test_cache_iterator_maintains_chunks to verify chunk
    preservation
  • Tests that multiple yields are cached as separate chunks and retrieved
    in same structure
  • Validates cache hit behavior maintains original chunk boundaries
+34/-0   
Documentation
CHANGELOG.md
Document cache chunk preservation fix                                       

CHANGELOG.md

  • Added entry for version 0.30.5 documenting the cache chunk
    preservation improvement
  • Categorized under Improvements section
+7/-0     
Configuration changes
pyproject.toml
Update version to 0.30.5                                                                 

pyproject.toml

  • Bumped version from 0.30.4 to 0.30.5
+1/-1     

melodyogonna avatar Nov 26 '25 16:11 melodyogonna

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • [ ] Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: The new caching control flow (yielding cached chunks) introduces critical behavior without
adding any audit logging of cache hits/misses or outcomes, making it unclear to
reconstruct events.

Referred Code
# Check if the result is already in the cache
try:
    if cache := await ocean.app.cache_provider.get(cache_key):
        for chunk in cache:
            yield chunk
        return
except FailedToReadCacheError as e:
    logger.warning(f"Failed to read cache for {cache_key}: {str(e)}")

# If not in cache, fetch the data
cached_results = list()
async for result in func(*args, **kwargs):
    cached_results.append(result)
    yield result

# Cache the results

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Edge case handling: Cache-hit path now iterates and yields chunks from cache without validating structure or
handling empty/invalid cached data, which may cause runtime issues if cache contains
unexpected values.

Referred Code
if cache := await ocean.app.cache_provider.get(cache_key):
    for chunk in cache:
        yield chunk
    return

Learn more about managing compliance generic rules or creating your own custom rules

  • [ ] Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review[bot] avatar Nov 26 '25 16:11 qodo-code-review[bot]

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
This change is potentially breaking

The modification to cache_iterator_result to yield chunks instead of a single
merged list is a breaking change. This should be documented and communicated as
such, as it will break consumers expecting the previous behavior.

Examples:

port_ocean/utils/cache.py [89-91]
                    for chunk in cache:
                        yield chunk
                    return

Solution Walkthrough:

Before:

# On cache hit, the decorated function yields a single list.
# Consumer code:
results = []
async for items in decorated_function():
    # This loop runs only ONCE.
    # `items` is a single list: [item1, item2, item3, item4]
    results.append(items)

# `results` will be [[item1, item2, item3, item4]]

After:

# On cache hit, the decorated function yields multiple chunks.
# Consumer code:
results = []
async for chunk in decorated_function():
    # This loop runs for EACH chunk.
    # `chunk` is a list, e.g., [item1, item2] then [item3, item4]
    results.append(chunk)

# `results` will be [[item1, item2], [item3, item4]]

Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical, overlooked aspect of the change: it alters the public contract of the cache_iterator_result decorator, making it a breaking change for existing consumers, yet it's classified as a non-breaking "Improvement".

High
Possible issue
Handle old cache format gracefully

Add a check to handle the old cache format to prevent runtime errors. If an old
format is detected, treat it as a cache miss, allowing the data to be re-fetched
and stored in the new chunked format.

port_ocean/utils/cache.py [88-91]

 if cache := await ocean.app.cache_provider.get(cache_key):
-    for chunk in cache:
-        yield chunk
-    return
+    # New format is list of lists (chunks). Old format was a flat list.
+    # An empty list is a valid cached result for an empty iterator.
+    if (cache and isinstance(cache[0], list)) or not cache:
+        for chunk in cache:
+            yield chunk
+        return
+    else:
+        logger.info(
+            f"Old cache format detected for {cache_key}. Re-fetching data."
+        )
  • [ ] Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical backward compatibility issue where old cache entries would cause runtime errors with the new logic, and it provides a robust, self-healing solution.

High
  • [ ] Update

qodo-code-review[bot] avatar Nov 26 '25 16:11 qodo-code-review[bot]

Code Coverage Artifact 📈: https://github.com/port-labs/ocean/actions/runs/19710247936/artifacts/4688492598

Code Coverage Total Percentage: 88.56%

github-actions[bot] avatar Nov 26 '25 16:11 github-actions[bot]

Code Coverage Artifact 📈: https://github.com/port-labs/ocean/actions/runs/19739524760/artifacts/4698705101

Code Coverage Total Percentage: 88.62%

github-actions[bot] avatar Nov 27 '25 14:11 github-actions[bot]

Code Coverage Artifact 📈: https://github.com/port-labs/ocean/actions/runs/19815920280/artifacts/4721881748

Code Coverage Total Percentage: 88.6%

github-actions[bot] avatar Dec 01 '25 08:12 github-actions[bot]