PyAirbyte
PyAirbyte copied to clipboard
Add multi-source support for caches
We have logged this issue to add support for data from multiple sources to be saved within the same cache.
Our implementation might already support this, since our internal caches and streams tables are (in theory) able to support data from multiple source names.
Before investing in dev side, we should probably try to prioritize some tests to confirm whether this is working or not. As things stand, this is relatively low priority.
@bindipankhudi - Here is the example notebook I was referring to earlier.
https://colab.research.google.com/drive/1YC_vCfrEwO7SzZFCN1X2PwevMLeGYDeC#scrollTo=Y-0YC-Qhl80W
Specifically, this part:
While I didn't explicitly declare or assign a cache, I believe these would all default to the equivalent get_default_cach().
Also, I'm not sure what would happen if these had streams sharing the same name.
When the same stream name exists in multiple source, things don't work. For instance, in this notepad: https://colab.research.google.com/drive/197-utzu1I0iMd5Gua0tyFUL2Gu_LFws1?usp=shari we are using source-faker and source-github both of which have "users" schema. We load from github first and then loading from faker fails because it expects the schema columns from Github.
Let's see if we can fail with an accurate message.
De-prioritizing an removing iteration label for now. We will prioritize this if we hear related requests from customers.