PyAirbyte icon indicating copy to clipboard operation
PyAirbyte copied to clipboard

Add multi-source support for caches

Open aaronsteers opened this issue 1 year ago • 4 comments

We have logged this issue to add support for data from multiple sources to be saved within the same cache.

Our implementation might already support this, since our internal caches and streams tables are (in theory) able to support data from multiple source names.

Before investing in dev side, we should probably try to prioritize some tests to confirm whether this is working or not. As things stand, this is relatively low priority.

aaronsteers avatar Feb 04 '24 05:02 aaronsteers

@bindipankhudi - Here is the example notebook I was referring to earlier.

https://colab.research.google.com/drive/1YC_vCfrEwO7SzZFCN1X2PwevMLeGYDeC#scrollTo=Y-0YC-Qhl80W

Specifically, this part:

image

While I didn't explicitly declare or assign a cache, I believe these would all default to the equivalent get_default_cach().

Also, I'm not sure what would happen if these had streams sharing the same name.

aaronsteers avatar Mar 26 '24 18:03 aaronsteers

When the same stream name exists in multiple source, things don't work. For instance, in this notepad: https://colab.research.google.com/drive/197-utzu1I0iMd5Gua0tyFUL2Gu_LFws1?usp=shari we are using source-faker and source-github both of which have "users" schema. We load from github first and then loading from faker fails because it expects the schema columns from Github.

bindipankhudi avatar Mar 28 '24 16:03 bindipankhudi

Let's see if we can fail with an accurate message.

bindipankhudi avatar Apr 01 '24 17:04 bindipankhudi

De-prioritizing an removing iteration label for now. We will prioritize this if we hear related requests from customers.

bindipankhudi avatar Apr 22 '24 16:04 bindipankhudi