superset icon indicating copy to clipboard operation
superset copied to clipboard

How to connect Superset with Apahce Drill using JDBC URL

Open kaIeidoscopic opened this issue 1 year ago • 2 comments

Hi team, Thanks for this powerful tools, I'm using Superset-1.4.2 docker container on kubernetes cluster. I used REST interface to connect the drill before but I found some performance issues so I'm trying to connect Drill cluster using JDBC URL. I have installed jdk/JayDeBeApi/jPype into container.

when I try to create a database in Superset UI URL:

drill+jdbc://<my host>:31010

Superset will throw JVM not started error.

I don't know where can I start JVM in Superset, so I try to add below code before call engine.raw_connection()

if not jpype.isJVMStarted():
         jpype.startJVM()
https://github.com/apache/superset/blob/master/superset/databases/commands/test_connection.py#L100

after this when I click TEST CONNECTION button on Superset, the first will return 'connection looks good' but second time return below error.

ERROR: (java.sql.java.sql.SQLException) java.sql.SQLExpection: Failure in creating DrillConnectionImpl: java.lang.NullPointerException(Background on this error at: http://sqlalche.me/e/13/dpapi)

I'm not sure if this is a bug or if I'm using it incorrectly, could you please provide more details about how to connect Superset with Apahce Drill using JDBC URL?

sqlalchemy-drill version: 1.1.2 Superset version: 1.4.2 JDBC driver : drill-jdbc-all-1.19.0.jar Jpype version: 1.4.1 JayDeBeApi version: 1.2.3

Thank you

kaIeidoscopic avatar Nov 10 '22 10:11 kaIeidoscopic

I am not sure exactly what performance issues you are facing, but if you can set up to use JPype dbapi2 rather than JayDeBeApi you may get faster performance. The JayDeBeAPI interface is a common bridge for Jython and Python to use, while JPype dbapi2 which is part of the JPype project is strictly CPython oriented, but at the same time it is 2 to 3 times faster. If you need even faster access you may want to consider using column wise pulls rather than row wise using Apache Arrow.

I would recommend looking at this blog post.

https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html

Thrameos avatar Nov 10 '22 21:11 Thrameos

Tempted to move this to a Q&A thread since it's not a bug, per se. However, the performance issue you mention might indeed be a bug/issue worth keeping here. If you have any additional details on the performance details, that would help immensely.

Once we have that, I'll try to relay this to folks who might have more context and/or workarounds.

rusackas avatar Nov 30 '22 00:11 rusackas

Hello @kaIeidoscopic as this issue has been stale for a long time and we are not supporting this version of Superset at this time, I will be closing it for now. Thank you!

geido avatar Feb 05 '24 18:02 geido