duckdb_iceberg icon indicating copy to clipboard operation
duckdb_iceberg copied to clipboard

snowflake - cannot parse the metadata with duckdb v1.3.2

Open whatsthecraic opened this issue 4 months ago • 8 comments

We found that duckdb 1.2 is able to properly parse the metadata snapshots created with snowflake, whereas duckdb v1.3.2 fails on the following document: https://gist.github.com/whatsthecraic/5b31954f9b94169559e6a4fe92de1ddb

The error returned by duckdb is:

D SELECT * FROM iceberg_scan('s3://bucket/path/to/metadata/pointer.json');
Invalid Input Error:
Object2 required property 'operation' is missing

The field operation is indeed missing in some of the snapshots.

The snapshots were created on snowflake with the following statements:

CREATE ICEBERG TABLE test1 ( A int ) CATALOG = 'snowflake' EXTERNAL_VOLUME = 'my_volume`';

// Insert some data
INSERT INTO test1 VALUES (10);
INSERT INTO test1 VALUES (10), (20);
INSERT INTO test1 VALUES (100), (200), (300);

// retrieve the metadata/root pointer
SELECT SYSTEM$GET_ICEBERG_TABLE_INFORMATION('test1'); 

whatsthecraic avatar Aug 08 '25 08:08 whatsthecraic

Hi @whatsthecraic,

Thank you for filing the issue! We will take a look and try to finish the iceberg read regardless of the operation type. We don't have a snowflake testing suite yet, which is why this has not been caught. Another reason is because operation is a required field according to the REST spec. This is a field Snowflake should add when creating snapshots

Tmonster avatar Aug 08 '25 12:08 Tmonster

This sounds like #374

To fix this, duckdb-avro needs to be able to read the toplevel metadata so we can act on the "format-version" in the file, rather than on the version in the metadata.json

Tishj avatar Aug 13 '25 23:08 Tishj

Also just stumbled upon this issue while reading Iceberg metadata from Snowflake.

This seems similar to this pyiceberg issue https://github.com/apache/iceberg-python/issues/1106

jonas-w avatar Aug 29 '25 15:08 jonas-w

@Tmonster would you accept a PR where we do a similar change as in pyiceberg?

https://github.com/apache/iceberg-python/pull/1263

nicornk avatar Aug 29 '25 16:08 nicornk

I stumbled on the issue and opened a support case with Snowflake

florian-ernst-alan avatar Sep 04 '25 16:09 florian-ernst-alan

This is the response I received - let's see how long the fix takes. "The engineering team has determined that the issue is caused by a serialization issue around history snapshots, which they are working on fixing."

nicornk avatar Sep 05 '25 06:09 nicornk

@Tmonster would you accept a PR where we do a similar change as in pyiceberg? https://github.com/apache/iceberg-python/pull/1263

Hi @nicornk,

Yes, happy to accept a PR that will assume an operation value in the summary if it is missing 👍. DuckDB should be able to read as much of an iceberg table as it can if when possible

Tmonster avatar Oct 03 '25 11:10 Tmonster

hi @Tmonster , we have prepared a PR #524 Can you please take a look? Thanks

nicornk avatar Oct 09 '25 08:10 nicornk

Can be closed

nicornk avatar Nov 29 '25 12:11 nicornk