docker-stacks icon indicating copy to clipboard operation
docker-stacks copied to clipboard

Adding JDBC postgres driver connection for external db to environment / jupyter

Open RobertSellers opened this issue 6 years ago • 2 comments

What docker image you are using?

pyspark-notebook on ubuntu 18.04 server.

What complete docker command do you run to launch the container?

I am using a docker-compose.yaml, for three services: jupyter, spark-master, and spark-worker-1. For the spark services, I am running (approximately):

spark-master:
       image: "pyspark-notebook"
       command: /home/jovyan/start-spark.sh
       volumes:
              - /local/scratch-drive/:/scratch
              - /local/work/:/usr/local/spark/work

spark-worker-1:
       image: "pyspark-notebook"
       command: /home/jovyan/start-spark-worker.sh
       volumes:
              - /local/scratch-drive/:/scratch
              - /local/work/:/usr/local/spark/work

What steps do you take once the container is running to reproduce the issue?

I am not super familiar with java, but I have tried a variety of things. Predominantly, I have been trying to run the jaydebeapi python library inside jupyter and point it to the three driver .jar files (one primary and two dependencies), we'll call them driver.jar, dependency1.jar, and dependency2.jar. I have run the following (jclassname / IP url not real) to no avail:

import jaydebeapi
path "/path/with/three_driver_jars/"
conn = jaydebeapi.connect(jclassname='com.example.jdbc.Driver', 
                    url= 'jdbc:https://0.0.0.0.0/sql:.', 
                    driver_args=[user, pw],
                    jars=os.listdir(path))

What do you expect to happen?

I expect to create a connection. I have been able to compile simple .java scripts on a local windows machine to read out records, but I am unable to reconcile this jdbc configuration inside this docker configuration. I have tried moving the .jars to the JRE directory, editing the spark-defaults.conf (see example below) and adding "CLASSPATH" environment variables

(inside spark-defaults.conf)

spark.driver.extraClassPath /shared_jars/driver.jar:/shared_jars/dependency1.jar:/shared_jars/dependency2.jar
spark.executor.extraClassPath /shared_jars/driver.jar:/shared_jars/dependency1.jar:/shared_jars/dependency2.jar

What actually happens?

I am routinely confronted with the following error:

java.lang.RuntimeExceptionPyRaisable: java.lang.RuntimeException: Class com.example.jdbc.Driver not found

For more clarity, I have successfully configured the following windows script to output data after having compiled the three .jars into a class file and executing, so the driver and credentials seem to be OK.

public class ConnectTest{
    public static void main(String[] args){
    java.sql.Driver driver = new com.example.jdbc.Driver();
    java.util.Properties info = new java.util.Properties();
    info.put("user", username);
    info.put("password", password);
    java.sql.Connection conn = driver.connect("https://0.0.0.0.0/sql", info);
    et cetera....
    conn.close();
 }

Am I missing something here? Is the java environment used for Spark incompatible or in need of modification? Or is this something I can possibly hack together inside of the current container state? I am very new to Java, but have some decent experience with python and docker.

RobertSellers avatar Apr 16 '19 20:04 RobertSellers

We're starting to experiment with a general Q&A section on https://discourse.jupyter.org/c/questions to see if cross-technology questions like this one catch more attention from a broader community audience. You might try re-posting your question over there to see if someone with more experience in this topic can help.

If you do post the question again on the Discourse site, feel free to leave a link in a comment here for those that happen upon this closed issue.

parente avatar Apr 21 '19 20:04 parente

Don't know if your problem is specifically that you don't have the jdbc driver installed. I had thir problem with MSSql Server JDBC driver. Tried a couple of things but nothing seemed to work (tried adding the jar in %%init_spark magic (I use scala), and some stuff like that.

Finally what I had to do was to manually copy the jar file of the JDBC to the CLASSPATH folder which defaults to something like /usr/local/spark/jars

I did this by using docker cp command to get the file from the host into the especific path in the container. After doing that I had no more problems with that driver.

raderas avatar Jun 21 '21 23:06 raderas

@RobertSellers could you please tell us, were you able to resolve this issue? Did @raderas solution help you? Or maybe you are no longer interested in this issue?

mathbunnyru avatar Oct 19 '22 07:10 mathbunnyru

Closing this one, since no response was received.

mathbunnyru avatar Oct 29 '22 20:10 mathbunnyru