delta
delta copied to clipboard
[BUG] Checkpoint documentation for writers is ambiguous
Checkpoint documentation in protocol specification is ambiguous and could be more concrete. Requirements for writers says that each row in checkpoint is an action. Later, checkpoint schema indicates that every row has all the actions (i.e. it has 5 columns). Also, every action has a different schema.
What is the parquet schema for the checkpoint?
Delta log entries documentation is clear, it says that there are multiple actions in the log separated by newline. Looks like similar thing is intended for checkpoint after reconciliation. Looking for clarity on the schema.
This seems related to #940
@vigneshc - the checkpoint schema section you linked is correct. Every row has all actions, each action has a different schema, and we expect at runtime all columns (per row) but one to be null, and one column to not be null.
Want to make a PR to update the PROTOCOL.md?