PyAirbyte icon indicating copy to clipboard operation
PyAirbyte copied to clipboard

💡 Feature Proposal: Add a way to clean up after a lazy `Source.get_records()` generator is abandoned

Open aaronsteers opened this issue 1 year ago • 0 comments

Today, the Source.get_records() method returns a LazyDataset that can be iterated upon to get records.

Given a source declared like this:

import airbyte as ab

source = ab.get_source(...)

You can iterate over records lazily like this:

max_records = 10

dataset: ab.LazyDataset = source.get_records("my_stream")
for record, x in enumerate(dataset):
    if x > max_records:
        break
    print(record)

This approach uses the dataset as an iterator and then aborts after the necessary count of records is found.

However, the connector process itself is not shut down when we stop iterating from it.

Improvement Proposal

Ideally, we'd add a callback to close the connection on the lazy dataset - and/or we'd operate like a context manager and auto-clean up the process when the context manager exits.

In practice, this has not caused any problems for our use cases - but it would be good to improve handling here.

aaronsteers avatar Nov 06 '24 21:11 aaronsteers