dataframe
dataframe copied to clipboard
Redshift not supported
Attempted to try out the new SQL support.
I used the Amazon provided JDBC driver and the connection oriented APIs. Redshift is a flavor of postgres and the driver should handle the differences/mappings so I figured it may just work.
Example Kotlin notebook
%use dataframe(0.12.0)
@file:DependsOn("../redshift-jdbc42-2.1.0.24.jar")
import com.amazon.redshift.jdbc.Driver
import com.amazon.redshift.jdbc.DataSource
data class DatabaseConfig(
val endpoint: String,
val database: String,
val user: String,
val password: String,
val port: Int
) {
val url: String = "jdbc:redshift://$endpoint:$port/$database"
}
val dbConfig = DatabaseConfig(...)
val ds = DataSource().apply {
setURL(dbConfig.url)
user = dbConfig.user
password = dbConfig.password
}
val query = "..."
val df = ds.getConnection().use { conn ->
DataFrame.readSqlQuery(conn, query)
}
Results in:
Only H2, MariaDB, MySQL, SQLite and PostgreSQL are supported!
java.lang.IllegalArgumentException: Unsupported database type in the url: jdbc:redshift://<redacted>
Only H2, MariaDB, MySQL, SQLite and PostgreSQL are supported!
at org.jetbrains.kotlinx.dataframe.io.db.UtilKt.extractDBTypeFromUrl(util.kt:20)
at org.jetbrains.kotlinx.dataframe.io.ReadJdbcKt.readSqlQuery(readJdbc.kt:183)
at org.jetbrains.kotlinx.dataframe.io.ReadJdbcKt.readSqlQuery(readJdbc.kt:168)
at Line_22_jupyter.<init>(Line_22.jupyter.kts:12)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.evalWithConfigAndOtherScriptsResults(BasicJvmScriptEvaluator.kt:105)
at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke$suspendImpl(BasicJvmScriptEvaluator.kt:47)
at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke(BasicJvmScriptEvaluator.kt)
at kotlin.script.experimental.jvm.BasicJvmReplEvaluator.eval(BasicJvmReplEvaluator.kt:49)
at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl$eval$resultWithDiagnostics$1.invokeSuspend(InternalEvaluatorImpl.kt:107)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl.eval(InternalEvaluatorImpl.kt:107)
at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$result$1.invoke(CellExecutorImpl.kt:78)
at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$result$1.invoke(CellExecutorImpl.kt:76)
at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl.withHost(repl.kt:681)
at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl.execute(CellExecutorImpl.kt:76)
at org.jetbrains.kotlinx.jupyter.repl.CellExecutor$DefaultImpls.execute$default(CellExecutor.kt:16)
at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl$evalEx$1.invoke(repl.kt:496)
at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl$evalEx$1.invoke(repl.kt:474)
at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl.withEvalContext(repl.kt:437)
at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl.evalEx(repl.kt:474)
at org.jetbrains.kotlinx.jupyter.messaging.ProtocolKt$shellMessagesHandler$2$res$1.invoke(protocol.kt:320)
at org.jetbrains.kotlinx.jupyter.messaging.ProtocolKt$shellMessagesHandler$2$res$1.invoke(protocol.kt:314)
at org.jetbrains.kotlinx.jupyter.JupyterExecutorImpl$runExecution$execThread$1.invoke(execution.kt:38)
at org.jetbrains.kotlinx.jupyter.JupyterExecutorImpl$runExecution$execThread$1.invoke(execution.kt:33)
at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
It's unclear to me how much work it would be to support Redshift or if using the custom provided driver if it should just work out of the box (using the connection oriented APIs of course). It seems like perhaps the connection oriented APIs though shouldn't be checking the database type or perhaps an escape hatch to disable it?
Building the project locally with a custom driver seems to work, haven't tested it much yet:
/**
* Represents the Amazon Redshift database type.
*/
public object Redshift: DbType("redshift") {
override val driverClassName: String
get() = "com.amazon.redshift.Driver"
override fun convertSqlTypeToColumnSchemaValue(tableColumnMetadata: TableColumnMetadata): ColumnSchema? = null
// not supported
// see https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html
override fun isSystemTable(tableMetadata: TableMetadata): Boolean = false
override fun buildTableMetadata(tables: ResultSet): TableMetadata {
return TableMetadata(
tables.metaData.getTableName(1),
tables.metaData.getSchemaName(1),
tables.metaData.getCatalogName(1),
)
}
override fun convertSqlTypeToKType(tableColumnMetadata: TableColumnMetadata): KType? = null
}
public fun extractDBTypeFromUrl(url: String?): DbType {
if (url != null) {
return when {
H2.dbTypeInJdbcUrl in url -> H2
MariaDb.dbTypeInJdbcUrl in url -> MariaDb
MySql.dbTypeInJdbcUrl in url -> MySql
Sqlite.dbTypeInJdbcUrl in url -> Sqlite
PostgreSql.dbTypeInJdbcUrl in url -> PostgreSql
Redshift.dbTypeInJdbcUrl in url -> Redshift
else -> throw IllegalArgumentException("Unsupported database type in the url: $url. " +
"Only H2, MariaDB, MySQL, SQLite, PostgreSQL, and Amazon Redshift are supported!")
}
} else {
throw SQLException("Database URL could not be null. The existing value is $url")
}
}
I'm not well versed in JDBC or Redshift so there may be a better way to do some of this. There may also be more to it to actually fully support Redshift IDK. I looked a the postgres type for inspiration. All of the customization centers around money
SQL type which isn't supported in Redshift anyway.
It may be nice to allow registering support for custom DB types using a custom driver or perhaps baking in support for Redshift if possible with instructions on how to add the custom driver and throw an exception if not found.
@zaleslaw is actively working on JDBC support right now and especially the type mapping is bound to undergo some changes in the future. Maybe he can help with a temporary solution in the meantime? Registering custom drivers would be awesome indeed!