spark-deep-learning
spark-deep-learning copied to clipboard
python dependencies are not downloaded along with the spark package
When I run a spark job with this library downloaded as a package, I get an error that tensorflow
is not found. I would expect that downloading this library as a package would pull in the necessary python dependencies. If that's not the case, what's the recommended way to include the necessary python dependencies?
There is a lot of discussion on approaches to handling pyspark dependencies:
- https://florianwilhelm.info/2018/03/isolated_environments_with_pyspark/
- https://developerzen.com/best-practices-writing-production-grade-pyspark-jobs-cb688ac4d20f#.wg3iv4kie
This question is a more general version of my other question re: dataproc
Can you post your stacktrace? It's possible that the spark executors don't have the dependencies, not the master. Can you also post your environment setup?
I understand your question is regarding general dependencies. In this particular example, if you install tensorflow, the error would go away. Sparkdl is unable to find tensorflow backend, hence the error.