Metadata Log Entries metadata table
Resolves #594 (and part of #511)
This PR creates a metadata table for "Metadata Log Entries", similar to its spark equivalent (metadata_log_entries).
To query the metadata table, use
tbl.inspect.metadata_log_entries()
References
- #524 (snapshots metadata table)
- #602 (references metadata table)
- #551 (entries metadata table)
Spark metadata log entries table is implemented in MetadataLogEntriesTable.java
The metadata log entries log is modified during TableMetadata creation, in which the current metadata log entry is appended (1, 2, 3). This leads to a surprising behavior where the last row of metadata entries table is based on when the query ran.
For example,
a = spark.sql(f"SELECT * FROM {identifier}.metadata_log_entries").toPandas()
import time
time.sleep(5)
b = spark.sql(f"SELECT * FROM {identifier}.metadata_log_entries").toPandas()
(Pdb) display(a)
display (a): timestamp file latest_snapshot_id latest_schema_id latest_sequence_number
0 2024-04-28 17:21:31.336 s3://warehouse/default/table_metadata_log_entr... NaN NaN NaN
1 2024-04-28 17:21:31.531 s3://warehouse/default/table_metadata_log_entr... 4.105762e+18 0.0 0.0
2 2024-04-28 17:21:31.600 s3://warehouse/default/table_metadata_log_entr... 7.201925e+18 0.0 0.0
3 2024-04-28 17:21:34.204 s3://warehouse/default/table_metadata_log_entr... 1.984627e+18 0.0 0.0
(Pdb) display(b)
display (b): timestamp file latest_snapshot_id latest_schema_id latest_sequence_number
0 2024-04-28 17:21:31.336 s3://warehouse/default/table_metadata_log_entr... NaN NaN NaN
1 2024-04-28 17:21:31.531 s3://warehouse/default/table_metadata_log_entr... 4.105762e+18 0.0 0.0
2 2024-04-28 17:21:31.600 s3://warehouse/default/table_metadata_log_entr... 7.201925e+18 0.0 0.0
3 2024-04-28 17:21:42.336 s3://warehouse/default/table_metadata_log_entr... 1.984627e+18 0.0 0.0
# Notice the timestamp in the last row of a and b differs by more than 5 seconds
Get Snapshot by timestamp (_snapshot_as_of_timestamp_ms) is modeled after snapshotIdAsOfTime from Java
There's an issue with reading V1 spec where the sequence-number is None instead of 0. According to the Iceberg spec, when reading v1 metadata for v2, Snapshot field sequence-number must default to 0 (source).