PyAirbyte
PyAirbyte copied to clipboard
💡 Feature Request: Ability to use database-type sources, such as `source-postgres` and `source-mysql`
Because most database sources are built on Java, they currently are not able to run in Python environments.
This feature would allow database-type sources to be run from PyAirbyte. Possible implementation options could be:
- Allow docker containers to be invoked by Airbyte.
- Important: While this technically is not a big lift, not sure if we want to take this approach - as it would create a very sharp difference in user experience for those runtimes which have docker access, versus those which do not.
- Find a way to package and install Java connectors as standalone executables.
- This is in theory also technically feasible, but this approach is subject to its own sharp edges - such as needing to have pre-built Java executables for
n
number of platforms/runtimes.
- This is in theory also technically feasible, but this approach is subject to its own sharp edges - such as needing to have pre-built Java executables for
Available Workarounds
Workaround # 1: Pre-Installing the Java-based connector
One workaround is to pre-install the Java-based connector on your local machine or docker image, and then create a CLI which can mimic the CLI of a Python-based connector. If registered on PATH, PyAirbyte will find this connector and not know/care what language it is written in.
Workaround # 2: Treating the source DB as an externally-managed "cache"
An alternative workaround, which admittedly would not solve all use cases, would be what is described in this issue:
- #85
Hypothetically, we could make a wrapper source, source-docker-wrapper
that takes a config __injected_source_image
for example and tries to spin up docker to run the source, and proxy it's output to PyAirbyte. Or build this natively into PyAirbyte itself instead of the proxy source.
Pros:
- We get DBs to work quickly.
Cons:
- This would require docker.
Running Java executable would require the host system to have the right version of Java, so I wonder if it's better than requiring docker at all. A bit more difficult to manage, I'd say.
Circling back to this issue after a new option has opened up.
Users can now use Docker to run database sources, if they have it available. This feature is in 'experimental' status while we gather feedback, but it should work to unblock use cases that require SQL-type sources or any source written in java.
Running docker sources is now promoted out of "experimental" status and is stable. Note: This only works if you have docker installed, which we recognize still will not be possible in some environments where you would want to run PyAirbyte.
https://airbytehq.github.io/PyAirbyte/airbyte/sources.html#get_source