spark-deep-learning icon indicating copy to clipboard operation
spark-deep-learning copied to clipboard

python dependencies are not downloaded along with the spark package

Open skeller88 opened this issue 5 years ago • 2 comments

When I run a spark job with this library downloaded as a package, I get an error that tensorflow is not found. I would expect that downloading this library as a package would pull in the necessary python dependencies. If that's not the case, what's the recommended way to include the necessary python dependencies?

There is a lot of discussion on approaches to handling pyspark dependencies:

  • https://florianwilhelm.info/2018/03/isolated_environments_with_pyspark/
  • https://developerzen.com/best-practices-writing-production-grade-pyspark-jobs-cb688ac4d20f#.wg3iv4kie

This question is a more general version of my other question re: dataproc

skeller88 avatar Dec 27 '19 01:12 skeller88

Can you post your stacktrace? It's possible that the spark executors don't have the dependencies, not the master. Can you also post your environment setup?

Ben-Epstein avatar Dec 27 '19 01:12 Ben-Epstein

I understand your question is regarding general dependencies. In this particular example, if you install tensorflow, the error would go away. Sparkdl is unable to find tensorflow backend, hence the error.

spark-water avatar Feb 10 '20 18:02 spark-water