airbyte
airbyte copied to clipboard
low-code: Yield records from generators instead of keeping them in in-memory lists
What
Improve memory usage by yielding records from generators instead of returning lists of objects
This PR addresses a part of https://github.com/airbytehq/airbyte-internal-issues/issues/6554
How
- Update the record selector, extractor, and filter interfaces to work on generators instead of lists of records
- Update the paginator interface to only use the number of records read and the last record instead of the full list of records read
- Update the simple retriever to tie in everything together
Reading order
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/record_extractor.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/http_selector.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/record_selector.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/record_filter.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/paginator.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/no_pagination.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/strategies/pagination_strategy.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/strategies/offset_increment.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/strategies/page_increment.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/strategies/cursor_pagination_strategy.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/strategies/stop_condition.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
-
airbyte-cdk/python/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
airbyte-docs | ⬜️ Ignored (Inspect) | Visit Preview | May 14, 2024 11:02pm |
confirmed this change in combination with some changes on the iterable side helps with the memory usage of the connector.
The large spikes show attempts with all the fixes, and the last (much lower one) shows the memory usage with this change, using generators in the custom components, and reducing the size of the time windows:
The underlying issue is that iterable returns gigantic responses (I've seen one ~4GB). I think fixing this would require streaming the responses, which is out of scope for this PR