duckdb_aws icon indicating copy to clipboard operation
duckdb_aws copied to clipboard

[v1.2.1 Regression] Can't connect to S3 with `v1.2.1` while it works with`v1.2.0`

Open jivanic-demystdata opened this issue 9 months ago • 4 comments

Hey 🙂

We have updated DuckDB from v1.2.0 to v1.2.1, and, by doing some tests, we have confirmed that there's a regression in v1.2.1 In v1.2.0, DuckDB could connect to our S3 to fetch data. Since v1.2.1, DuckDB fails with the following error:

HTTP Error: HTTP GET error on 'https://<my_bucket>.s3.amazonaws.com/8440bb77-ca6f-4e70-b125-1591b410add0/small.csv' (HTTP 403)

Stack trace (we use the Java integration):

org.duckdb.DuckDBNative.duckdb_jdbc_execute(Native Method)
org.duckdb.DuckDBPreparedStatement.execute(DuckDBPreparedStatement.java:148)
org.duckdb.DuckDBPreparedStatement.execute(DuckDBPreparedStatement.java:127)
org.duckdb.DuckDBPreparedStatement.executeQuery(DuckDBPreparedStatement.java:180)
org.duckdb.DuckDBPreparedStatement.executeQuery(DuckDBPreparedStatement.java:208)

Seems like DuckDB fails to pick the credentials from the AWS server env 🤔

Here's what we do: (Scala code)

val conn = DriverManager.getConnection("jdbc:duckdb:").unwrap(classOf[DuckDBConnection])
val stmt = conn.createStatement()
stmt.execute(
  """
    |CREATE SECRET (
    |  TYPE S3,
    |  PROVIDER CREDENTIAL_CHAIN
    |);
    |INSTALL iceberg;
    |INSTALL aws;
    |LOAD iceberg;
    |LOAD aws;
  """.stripMargin.trim
)

stmt.executeQuery("FROM sniff_csv('s3://<my_bucket>/8440bb77-ca6f-4e70-b125-1591b410add0/small.csv')")

This code works in v1.2.0 but fails with the error mentioned above with v1.2.1

Cc @guizmaii

jivanic-demystdata avatar Mar 12 '25 01:03 jivanic-demystdata

Hey @jivanic-demystdata thanks for reporting, would you mind posting the output of running pragma extension_versions; in duckdb?

~also, where are your credentials located? (e.g. in env vars, the ~/.aws/credentials file, etc)~ nvm i did not read properly..

samansmink avatar Mar 12 '25 08:03 samansmink

I did this:

//> using dep org.duckdb:duckdb_jdbc:1.2.1

import java.sql.DriverManager
import org.duckdb.{DuckDBConnection, DuckDBResultSet, DuckDBStruct}

val conn = DriverManager.getConnection("jdbc:duckdb:").unwrap(classOf[DuckDBConnection])
val stmt = conn.createStatement()
stmt.execute(
  """
    |CREATE SECRET (
    |  TYPE S3,
    |  PROVIDER CREDENTIAL_CHAIN
    |);
    |INSTALL iceberg;
    |INSTALL aws;
    |LOAD iceberg;
    |LOAD aws;
  """.stripMargin.trim
)

val rs = stmt.executeQuery("pragma extension_versions;")

val metaData    = rs.getMetaData
val columnCount = metaData.getColumnCount

// Print column names (optional)
for (i <- 1 to columnCount) {
  print(metaData.getColumnName(i) + "\t")
}
println()

// Iterate through results
while (rs.next()) {
  // Access each column by index
  for (i <- 1 to columnCount) {
    // getString works for most data types in a display context
    val s = rs.getString(i)
    if (s.isBlank) print("null\t") else print(rs.getString(i) + "\t")
  }
  println()
}

This code is a scala-cli script. To run it:

  1. Install scala-cli See https://scala-cli.virtuslab.org/install
  2. Create a new file named duck.sc
  3. Copy/paste the previous code into the duck.sc file

(I print null when there's no value just to make it easier to format the data below)

And it prints this:

extension_name	extension_version	install_mode		installed_from	
aws	        b3050f3	                REPOSITORY		core	
core_functions	null	                STATICALLY_LINKED	null	
httpfs	        85ac466	                REPOSITORY		core	
iceberg		43b4e37			REPOSITORY		core	
icu		null			STATICALLY_LINKED	null	
json		null			STATICALLY_LINKED	null	
parquet		null			STATICALLY_LINKED	null		

jivanic-demystdata avatar Mar 12 '25 23:03 jivanic-demystdata

@samansmink Is this fixed in 1.2.2?

jivanic-demystdata avatar Apr 09 '25 09:04 jivanic-demystdata

Just for the record, was looking at this from JDBC side, adding details from Discord. Assuming that IAM roles for service accounts (IRSA) has worked with v1.2.0 and got broken in v1.2.1 it looks like only duckdb/duckdb-httpfs#21 and #68 changes were added between these two versions. I cannot tell whether these changes can be relevant to the problem with IRSA or not, leaving this to @samansmink .

staticlibs avatar Apr 24 '25 10:04 staticlibs

Any news? 🙂

jivanic-demystdata avatar May 26 '25 06:05 jivanic-demystdata

Closing at it seems to be fixed in v1.4.1

jivanic-demystdata avatar Oct 29 '25 23:10 jivanic-demystdata