[v1.2.1 Regression] Can't connect to S3 with `v1.2.1` while it works with`v1.2.0`
Hey 🙂
We have updated DuckDB from v1.2.0 to v1.2.1, and, by doing some tests, we have confirmed that there's a regression in v1.2.1
In v1.2.0, DuckDB could connect to our S3 to fetch data.
Since v1.2.1, DuckDB fails with the following error:
HTTP Error: HTTP GET error on 'https://<my_bucket>.s3.amazonaws.com/8440bb77-ca6f-4e70-b125-1591b410add0/small.csv' (HTTP 403)
Stack trace (we use the Java integration):
org.duckdb.DuckDBNative.duckdb_jdbc_execute(Native Method)
org.duckdb.DuckDBPreparedStatement.execute(DuckDBPreparedStatement.java:148)
org.duckdb.DuckDBPreparedStatement.execute(DuckDBPreparedStatement.java:127)
org.duckdb.DuckDBPreparedStatement.executeQuery(DuckDBPreparedStatement.java:180)
org.duckdb.DuckDBPreparedStatement.executeQuery(DuckDBPreparedStatement.java:208)
Seems like DuckDB fails to pick the credentials from the AWS server env 🤔
Here's what we do: (Scala code)
val conn = DriverManager.getConnection("jdbc:duckdb:").unwrap(classOf[DuckDBConnection])
val stmt = conn.createStatement()
stmt.execute(
"""
|CREATE SECRET (
| TYPE S3,
| PROVIDER CREDENTIAL_CHAIN
|);
|INSTALL iceberg;
|INSTALL aws;
|LOAD iceberg;
|LOAD aws;
""".stripMargin.trim
)
stmt.executeQuery("FROM sniff_csv('s3://<my_bucket>/8440bb77-ca6f-4e70-b125-1591b410add0/small.csv')")
This code works in v1.2.0 but fails with the error mentioned above with v1.2.1
Cc @guizmaii
Hey @jivanic-demystdata thanks for reporting, would you mind posting the output of running pragma extension_versions; in duckdb?
~also, where are your credentials located? (e.g. in env vars, the ~/.aws/credentials file, etc)~ nvm i did not read properly..
I did this:
//> using dep org.duckdb:duckdb_jdbc:1.2.1
import java.sql.DriverManager
import org.duckdb.{DuckDBConnection, DuckDBResultSet, DuckDBStruct}
val conn = DriverManager.getConnection("jdbc:duckdb:").unwrap(classOf[DuckDBConnection])
val stmt = conn.createStatement()
stmt.execute(
"""
|CREATE SECRET (
| TYPE S3,
| PROVIDER CREDENTIAL_CHAIN
|);
|INSTALL iceberg;
|INSTALL aws;
|LOAD iceberg;
|LOAD aws;
""".stripMargin.trim
)
val rs = stmt.executeQuery("pragma extension_versions;")
val metaData = rs.getMetaData
val columnCount = metaData.getColumnCount
// Print column names (optional)
for (i <- 1 to columnCount) {
print(metaData.getColumnName(i) + "\t")
}
println()
// Iterate through results
while (rs.next()) {
// Access each column by index
for (i <- 1 to columnCount) {
// getString works for most data types in a display context
val s = rs.getString(i)
if (s.isBlank) print("null\t") else print(rs.getString(i) + "\t")
}
println()
}
This code is a scala-cli script. To run it:
- Install scala-cli See https://scala-cli.virtuslab.org/install
- Create a new file named
duck.sc - Copy/paste the previous code into the
duck.scfile
(I print null when there's no value just to make it easier to format the data below)
And it prints this:
extension_name extension_version install_mode installed_from
aws b3050f3 REPOSITORY core
core_functions null STATICALLY_LINKED null
httpfs 85ac466 REPOSITORY core
iceberg 43b4e37 REPOSITORY core
icu null STATICALLY_LINKED null
json null STATICALLY_LINKED null
parquet null STATICALLY_LINKED null
@samansmink Is this fixed in 1.2.2?
Just for the record, was looking at this from JDBC side, adding details from Discord. Assuming that IAM roles for service accounts (IRSA) has worked with v1.2.0 and got broken in v1.2.1 it looks like only duckdb/duckdb-httpfs#21 and #68 changes were added between these two versions. I cannot tell whether these changes can be relevant to the problem with IRSA or not, leaving this to @samansmink .
Any news? 🙂
Closing at it seems to be fixed in v1.4.1