snowflake - cannot parse the metadata with duckdb v1.3.2
We found that duckdb 1.2 is able to properly parse the metadata snapshots created with snowflake, whereas duckdb v1.3.2 fails on the following document: https://gist.github.com/whatsthecraic/5b31954f9b94169559e6a4fe92de1ddb
The error returned by duckdb is:
D SELECT * FROM iceberg_scan('s3://bucket/path/to/metadata/pointer.json');
Invalid Input Error:
Object2 required property 'operation' is missing
The field operation is indeed missing in some of the snapshots.
The snapshots were created on snowflake with the following statements:
CREATE ICEBERG TABLE test1 ( A int ) CATALOG = 'snowflake' EXTERNAL_VOLUME = 'my_volume`';
// Insert some data
INSERT INTO test1 VALUES (10);
INSERT INTO test1 VALUES (10), (20);
INSERT INTO test1 VALUES (100), (200), (300);
// retrieve the metadata/root pointer
SELECT SYSTEM$GET_ICEBERG_TABLE_INFORMATION('test1');
Hi @whatsthecraic,
Thank you for filing the issue! We will take a look and try to finish the iceberg read regardless of the operation type. We don't have a snowflake testing suite yet, which is why this has not been caught. Another reason is because operation is a required field according to the REST spec. This is a field Snowflake should add when creating snapshots
This sounds like #374
To fix this, duckdb-avro needs to be able to read the toplevel metadata so we can act on the "format-version" in the file, rather than on the version in the metadata.json
Also just stumbled upon this issue while reading Iceberg metadata from Snowflake.
This seems similar to this pyiceberg issue https://github.com/apache/iceberg-python/issues/1106
@Tmonster would you accept a PR where we do a similar change as in pyiceberg?
https://github.com/apache/iceberg-python/pull/1263
I stumbled on the issue and opened a support case with Snowflake
This is the response I received - let's see how long the fix takes. "The engineering team has determined that the issue is caused by a serialization issue around history snapshots, which they are working on fixing."
@Tmonster would you accept a PR where we do a similar change as in pyiceberg? https://github.com/apache/iceberg-python/pull/1263
Hi @nicornk,
Yes, happy to accept a PR that will assume an operation value in the summary if it is missing 👍. DuckDB should be able to read as much of an iceberg table as it can if when possible
hi @Tmonster , we have prepared a PR #524 Can you please take a look? Thanks
Can be closed