vscode-dbt-power-user icon indicating copy to clipboard operation
vscode-dbt-power-user copied to clipboard

DBT Power User Replaces the PYTHONPATH environment variable when calling the node_python_bridge.py

Open cccs-jc opened this issue 1 year ago • 2 comments

Describe the feature

We install pyspark using the manual download as specified here https://spark.apache.org/docs/latest/api/python/getting_started/install.html#manually-downloading

In a nutshell pyspark is distributed with the spark tarbal and you point the PYTHONPATH at your install location

However, DBT Power User replaces the PYTHONPATH environment variable before calling the node_python_bridge.py bridge. This causes our DBT instance to be unable to find pyspark.

https://docs.getdbt.com/docs/core/connect-data-platform/spark-setup#session

To address this DBT Power User could simply pre-pend the path to it's node_python_bridge.py directory and leave what ever was there before. export PYTHONPATH=<dbt_power_user_location>:$PYTHONPATH

Describe alternatives you've considered

The alternative is to use a venv and install pyspark. But this means we would have pyspark installed twice with the potential of having different version then the spark installed via the tarbal.

Who will benefit?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

cccs-jc avatar Feb 22 '25 19:02 cccs-jc

We don't override the python path, the VS code Python extension does that. Simply select the Python path you wanna use by following instructions on https://docs.myaltimate.com/setup/reqdConfig/#associate-python-interpreter-with-dbt-installation

mdesmet avatar Feb 23 '25 01:02 mdesmet

In vscode I specify the python interpreter. I point it to my venv. However, my venv does not contain a pyspark installation. Our pyspark installation is located using the environment variable PYTHONPATH.

But when I retrieve this environment variable inside the python code invoked by DBT Power User. This environment variable is set to the directory where the node_python_bridge.py file is located and thus the pyspark library cannot be loaded.

cccs-jc avatar Feb 24 '25 13:02 cccs-jc