ocean icon indicating copy to clipboard operation
ocean copied to clipboard

[Integration][AWS-V3] Improve Memory Management On AWS-V3

Open mk-armah opened this issue 1 month ago • 2 comments

User description

Description

What -

Why -

How -

Type of change

Please leave one option from the following and delete the rest:

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] New Integration (non-breaking change which adds a new integration)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Non-breaking change (fix of existing functionality that will not change current behavior)
  • [ ] Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • [ ] Integration able to create all default resources from scratch
  • [ ] Resync finishes successfully
  • [ ] Resync able to create entities
  • [ ] Resync able to update entities
  • [ ] Resync able to detect and delete entities
  • [ ] Scheduled resync able to abort existing resync and start a new one
  • [ ] Tested with at least 2 integrations from scratch
  • [ ] Tested with Kafka and Polling event listeners
  • [ ] Tested deletion of entities that don't pass the selector

Integration testing checklist

  • [ ] Integration able to create all default resources from scratch
  • [ ] Completed a full resync from a freshly installed integration and it completed successfully
  • [ ] Resync able to create entities
  • [ ] Resync able to update entities
  • [ ] Resync able to detect and delete entities
  • [ ] Resync finishes successfully
  • [ ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • [ ] If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • [ ] If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • [ ] Docs PR link here

Preflight checklist

  • [ ] Handled rate limiting
  • [ ] Handled pagination
  • [ ] Implemented the code in async
  • [ ] Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

API Documentation

Provide links to the API documentation used for this integration.


PR Type

Bug fix


Description

  • Reduce concurrent region and account limits to prevent memory exhaustion

  • Replace Semaphore with BoundedSemaphore for proper resource cleanup

  • Add explicit garbage collection and resource deletion after processing

  • Reorder S3 bucket actions to optimize memory usage patterns


Diagram Walkthrough

flowchart LR
  A["Memory Issues"] -->|Reduce Concurrency| B["_MAX_CONCURRENT_REGIONS: 10→5"]
  A -->|Replace Semaphore| C["Semaphore→BoundedSemaphore"]
  A -->|Explicit Cleanup| D["gc.collect() + del statements"]
  A -->|Reorder Actions| E["S3 bucket action optimization"]
  B --> F["Improved Memory Management"]
  C --> F
  D --> F
  E --> F

File Walkthrough

Relevant files
Enhancement
actions.py
Reorder S3 bucket actions for memory optimization               

integrations/aws-v3/aws/core/exporters/s3/bucket/actions.py

  • Reordered S3BucketActionsMap defaults and options lists
  • Moved GetBucketLocationAction from defaults to options
  • Moved GetBucketTaggingAction before ListBucketsAction in defaults
  • Optimizes action execution order to reduce memory footprint
+2/-2     
Bug fix
resync.py
Implement memory leak fixes and resource cleanup                 

integrations/aws-v3/resync.py

  • Reduced _MAX_CONCURRENT_REGIONS from 10 to 5
  • Removed _MAX_CONCURRENT_ACCOUNTS constant (was 5)
  • Replaced asyncio.Semaphore with asyncio.BoundedSemaphore for proper
    cleanup
  • Added explicit garbage collection (gc.collect()) after batch
    processing
  • Added explicit resource deletion (del exporter, options_factory,
    strategy)
  • Added logging for successful global resource fetching
  • Removed fallback to _MAX_CONCURRENT_ACCOUNTS in
    max_concurrent_accounts initialization
  • Reordered semaphore initialization before task list creation
  • Removed commented-out alternative implementation code
+65/-8   

mk-armah avatar Oct 23 '25 13:10 mk-armah