PyAirbyte
PyAirbyte copied to clipboard
💡 Feature Proposal: Add a way to clean up after a lazy `Source.get_records()` generator is abandoned
Today, the Source.get_records() method returns a LazyDataset that can be iterated upon to get records.
Given a source declared like this:
import airbyte as ab
source = ab.get_source(...)
You can iterate over records lazily like this:
max_records = 10
dataset: ab.LazyDataset = source.get_records("my_stream")
for record, x in enumerate(dataset):
if x > max_records:
break
print(record)
This approach uses the dataset as an iterator and then aborts after the necessary count of records is found.
However, the connector process itself is not shut down when we stop iterating from it.
Improvement Proposal
Ideally, we'd add a callback to close the connection on the lazy dataset - and/or we'd operate like a context manager and auto-clean up the process when the context manager exits.
In practice, this has not caused any problems for our use cases - but it would be good to improve handling here.