PyAirbyte
PyAirbyte copied to clipboard
Add guidance for avoiding version conflicts with connectors and other Python CLI apps
[!NOTE] The text below is a composite of resources from various locations. We'll continue to evolve this and consider putting into its own "docs" page if helpful to folks. You are invited to drop a comment below or "+1" if this does work - and also if it doesn't work for your use case. Thanks!
We have a common request for assist in resolving Python dependency conflicts with other libraries and Python CLI apps. I'm creating this issue to document some of the considerations, work arounds, and best practices.
Preface: The distinction between "apps" and "libraries"
For this discussion, let's say an "app" is anything with a CLI, where "libraries" must be invoked directly from within Python.
PyAirbyteis a library, since the primary way you interact with it is viaimport airbytein your Python code.dbtis an app, because the primary way you interact with it is via thedbtCLI.
This is important, because while all libraries you are using must coexist in the same Python environment, the same is not true for CLI apps. The best practice for CLI apps is to install them in their own virtual environment. While this can often be cumbersome and manual, there are some helpful tools to streamline it.
Best Practice for installing Python CLI Apps
Whenever installing CLI Apps like dbt, the best practice is to create a virtual environment and install the CLI app into its own virtual environment. This provides the most stable experience for the CLI app itself, and also completely decouples those version constraints of the CLI app and the version constraints of libraries you are using on the same workspace or container.
Streamlining CLI App Installation
There are two very good tools to make CLI app installation just as easy (or almost as easy) as normal pip install methods. The below options apply to all Python CLI apps - which includes tools like dbt and harlequin, as well as (optionally) preinstalling Airbyte connectors like airbyte-source-hubspot.
Using pipx
pipx is the original (to my knowledge) and most widely used. In most cases, you can simply run pip install pipx and then pipx install my-tool. The pipx syntax intentionally is as similar as possible to the syntax of pip so that many tools can be installed into their own dedicated virtual environment simply by replacing the word pip with pipx. (pipx now also comes standard on many Python images so you might not need to pre-install it.)
Using uv and uvx
A newer tool called uv has a similar uvx or uv tools command which can be used similarly to pipx. It is newer and faster than pipx, but also less tested because it is (for now) less used.
Common Installation Patterns
Docker-Based Pre-Installs
Some sample Dockerimage code in this comment specifically around pre-installing connectors onto docker images:
- https://github.com/airbytehq/PyAirbyte/issues/78#issuecomment-2088869792
Reported to me by a user:
The trick that worked in Airflow was to use a Dockerfile that handles the isolation of installing the connectors into their own virtualenvs:
# Pre-install the connnector(s) in their own virtualenv RUN python -m venv source_github && source source_github/bin/activate &&\ pip install --no-cache-dir airbyte-source-github && deactivate # ... repeat for other connectors ... # Test that the executable works and we can find it RUN source/bin/source-github spec # Go ahead and install PyAirbyte as usual RUN python -m venv pyairbyte_venv && source pyairbyte_venv/bin/activate &&\ pip install --no-cache-dir airbyte==0.10.4 && deactivate
If pipx is preinstalled on the image, this is slightly easier:
# pipx handles the virtual-env and auto-adds the connector CLI to PATH: RUN pipx install airbyte-source-github RUN pipx install airbyte-source-faker # Test that the executables work and we can find them on PATH RUN source-github spec RUN source-faker spec # Go ahead and install PyAirbyte as usual RUN python -m venv pyairbyte_venv && source pyairbyte_venv/bin/activate &&\ pip install --no-cache-dir airbyte==0.10.4 && deactivate
Installing dbt
Per this discussion: https://github.com/airbytehq/PyAirbyte/issues/441
Slightly more difficult than a normal pipx install, because it requires more than one package installed into the same virtual environment:
# Install dbt core and postgres dbt engine:
pipx install --preinstall=dbt-postgres dbt-core
# Confirm install worked:
dbt --version
Related Issues:
- https://github.com/airbytehq/PyAirbyte/issues/78
- https://github.com/airbytehq/PyAirbyte/issues/441