delta icon indicating copy to clipboard operation
delta copied to clipboard

[BUG] Checkpoint documentation for writers is ambiguous

Open vigneshc opened this issue 3 years ago • 2 comments

Checkpoint documentation in protocol specification is ambiguous and could be more concrete. Requirements for writers says that each row in checkpoint is an action. Later, checkpoint schema indicates that every row has all the actions (i.e. it has 5 columns). Also, every action has a different schema.

What is the parquet schema for the checkpoint?

Delta log entries documentation is clear, it says that there are multiple actions in the log separated by newline. Looks like similar thing is intended for checkpoint after reconciliation. Looking for clarity on the schema.

vigneshc avatar Jul 21 '22 17:07 vigneshc

This seems related to #940

allisonport-db avatar Jul 26 '22 21:07 allisonport-db

@vigneshc - the checkpoint schema section you linked is correct. Every row has all actions, each action has a different schema, and we expect at runtime all columns (per row) but one to be null, and one column to not be null.

Want to make a PR to update the PROTOCOL.md?

scottsand-db avatar Aug 10 '22 02:08 scottsand-db