containers icon indicating copy to clipboard operation
containers copied to clipboard

How to add jars to a custom container?

Open evanye opened this issue 4 years ago • 11 comments

Creating a new issue from the question in https://github.com/databricks/containers/issues/11

Is there a way to also install jars in a custom container so they end up in the spark classpath after launch?

evanye avatar May 18 '20 17:05 evanye

Try adding your jars to /databricks/jars. This folder is on the classpath and will be included when we start spark.

evanye avatar May 18 '20 17:05 evanye

@evanye How to add files to executor classpath in custom databricks container? I am using "--files" and "spark.executor.extraLibraryPath" , and files are also being recognised by driver but somehow files are not being cascaded to executors .

rd-rohit avatar Jan 21 '21 10:01 rd-rohit

@rd-rohit I think executors also use /databricks/jars - can you try without the additional options?

evanye avatar Jan 21 '21 18:01 evanye

@rd-rohit I think executors also use /databricks/jars - can you try without the additional options?

Sure , Thank you @evanye , will try by keeping the files in /databricks/jars directory and see if it works.

rd-rohit avatar Jan 22 '21 06:01 rd-rohit

@rd-rohit I think executors also use /databricks/jars - can you try without the additional options?

Sure , Thank you @evanye , will try by keeping the files in /databricks/jars directory and see if it works.

@evanye I tried , it didn't worked by adding in databricks/jars , instead i had to keep jars and files in new directory and then pass them as arguments in spark-submit command as --files and --jars

rd-rohit avatar Jan 22 '21 10:01 rd-rohit

@rd-rohit good to know, thanks! The classpath dependencies here can be complicated

evanye avatar Apr 14 '21 21:04 evanye

@rd-rohit good to know, thanks! The classpath dependencies here can be complicated

@evanye , Thanks for the follow up , i was able to execute the run successfully, actually it was the issue of the working directory , i got to know that if we are passing jars & files by --jars & --files respectively , then their working directory is different from the jars located on /databricks/jars .

But i do have one follow up question, i don't know if it is possible , but can the jars present in /databricks/jars access the files passed by --files that are loaded by executors in separate temp working directory .

PS : i tried but it didn't worked for me , jars in /databricks/jars were not able to locate the files present in temp working directory, but just wanted to know if there is any alternate approach that maybe i am unable to find.

Thanks, Rohit.

rd-rohit avatar Apr 15 '21 14:04 rd-rohit

@evanye I am running into the same issue attempting to classload jars in a custom container deployed to Databricks. Adding the jars to /databricks/jars, I still see errors indicating the jars were not classloaded.

a0x8o avatar Aug 05 '21 13:08 a0x8o

@a0x8o what dbr version? what error message?

evanye avatar Aug 05 '21 17:08 evanye

For the people still trying to figure this out, what I did was create a init script and store it within the container image. What the script does is to just copy whichever jar file you want into /databricks/jars. e.g. cp /custom/jars/my_jar.jar /databricks/jars

In this way the jar is loaded automatically into the classpath and you don't need to include it as part of your spark submit and you can also make use of it from the notebook environment.

j-mechacorta avatar Jan 24 '23 13:01 j-mechacorta

Just add your jars to /databricks/python3/lib/python3.10/site-packages/pyspark/jars. This is the location for pyspark jars.

Nicbyte avatar Mar 11 '24 14:03 Nicbyte