DBT Power User Replaces the PYTHONPATH environment variable when calling the node_python_bridge.py
Describe the feature
We install pyspark using the manual download as specified here https://spark.apache.org/docs/latest/api/python/getting_started/install.html#manually-downloading
In a nutshell pyspark is distributed with the spark tarbal and you point the PYTHONPATH at your install location
However, DBT Power User replaces the PYTHONPATH environment variable before calling the node_python_bridge.py bridge. This causes our DBT instance to be unable to find pyspark.
https://docs.getdbt.com/docs/core/connect-data-platform/spark-setup#session
To address this DBT Power User could simply pre-pend the path to it's node_python_bridge.py directory and leave what ever was there before. export PYTHONPATH=<dbt_power_user_location>:$PYTHONPATH
Describe alternatives you've considered
The alternative is to use a venv and install pyspark. But this means we would have pyspark installed twice with the potential of having different version then the spark installed via the tarbal.
Who will benefit?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
We don't override the python path, the VS code Python extension does that. Simply select the Python path you wanna use by following instructions on https://docs.myaltimate.com/setup/reqdConfig/#associate-python-interpreter-with-dbt-installation
In vscode I specify the python interpreter. I point it to my venv. However, my venv does not contain a pyspark installation. Our pyspark installation is located using the environment variable PYTHONPATH.
But when I retrieve this environment variable inside the python code invoked by DBT Power User. This environment variable is set to the directory where the node_python_bridge.py file is located and thus the pyspark library cannot be loaded.