airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

[New connector request] SQLite source and destination connector

Open yuhuishi-convect opened this issue 3 years ago • 4 comments

Tell us about the new connector you’d like to have

  • Which source and which destination? SQLite database as both the source and destination connector.

  • Do you need a specific version of the underlying data source e.g: you specifically need support for an older version of the API or DB? NA

Describe the context around this new connector

  • Why do you need this integration? How does your team intend to use the data? This helps us understand the use case.

We would like to support syncing a bundle of related files into multiple tables in a destination database. Sqlite is a good way to group multiple structured files into a SINGLE file. This is useful, for example when we have multiple CSV files coming from a data dump of an eCommerce-store, e.g., products, orders, inventory, we can always first dump them into one SQLite file and distribute them conveniently. Then, if SQLite is supported as a source connector, the tables in the SQLite database can be synced to destination databases.

Loading multiple heterogenous files into different tables is currently impossible since files source connector can only load data into one single table, with heterogeneous schema. The workaround is to create multiple such file source connectors where each one handles one file. But this makes the "T" step hard after the "L" step since "L" is done with multiple connectors.

It's also very convenient to make SQLite a destination connector. This makes data dumping from a source to a single file convenient.

  • How often do you want to run syncs? On-demand

  • If this is an API source connector, which entities/endpoints do you need supported? NA

  • If the connector is for a paid service, can we name you as a mutual user when we subscribe for an account? Which company should we name? NA

Describe the alternative you are considering or using

What are you considering doing if you don’t have this integration through Airbyte? I write ad-hoc python scripts to connect to a sqlite database and load everything into destination tables.

Are you willing to submit a PR?

Yes.

yuhuishi-convect avatar Mar 08 '22 00:03 yuhuishi-convect

i'm at Airbyte now and would be interested in shepherding at least a SQLite destination connector first through if you are willing to work on it

swyxio avatar Jul 03 '22 23:07 swyxio

i'm at Airbyte now and would be interested in shepherding at least a SQLite destination connector first through if you are willing to work on it

Cool. Will work on one based on JDBCdestination

yuhuishi-convect avatar Jul 11 '22 07:07 yuhuishi-convect

hello @sw-yx I have a PR for SQLite destination.

Would you mind taking a look at it?

#15018

yuhuishi-convect avatar Jul 25 '22 19:07 yuhuishi-convect

@yuhuishi-convect took a brief look - its very cool! destination more important than source, so i am glad you went with the hard part first.

the main thing we will need is time for someone from @marcosmarxm's team to properly review as engineering but i do think this will be a big step for us to create personal data warehouses and offer airbyte for smaller usecases

swyxio avatar Jul 30 '22 01:07 swyxio

Tagging this for the new destinations team (sql source has already been added by @yuhuishi-convect )

grishick avatar Sep 27 '22 18:09 grishick

very excited! destination > source :)

swyxio avatar Sep 27 '22 22:09 swyxio

I'm interested in this! For SQLite as the source, do you have any thoughts about how we would host the file?

A few ideas:

  • Locally on the machine where AirByte runs
  • In an S3 bucket
  • Via some kind of connector (SFTP?)

I'm currently using S3 + CSV but and SQLite source appeals due to being able to place all the data in one database but also rely on the primary id in SQLite for incremental updates vs. the date on the CSV file.

Rodeoclash avatar Oct 03 '22 21:10 Rodeoclash

Probably similar to other local files:

Locally on the machine where AirByte runs

it is the best way, maybe using s3 or other ways can be added to be improved.

marcosmarxm avatar Oct 04 '22 16:10 marcosmarxm