Puffin: Add delete-vector-v1 blob type
This adds a blob type to the Puffin spec that can store a Roaring bitmap delete vector. This is in support of the row-level delete improvements proposed for Iceberg v3.
I am going to share a PR with some basic implementation that follows this spec. We can use it as an example that will hopefully clarify some questions. Thanks for putting this together, @rdblue!
I’m not sure it’s worth drawing a line in the sand over this particular issue and I’d like to talk about it a bit more as a community before we merge this. I don’t want to set a precedent of adding write requirements to the Iceberg spec that aren’t actually requirements for Iceberg. I feel like if we make this a pattern we will essentially be deferring design decisions and I don’t really feel comfortable with that.
This is my main concern, I don't think the technical differences here really present blockers, they just add some warts. I also think compatibility between table formats is good goal, but I worry that due to governance differences between Iceberg and Delta, things naturally will go slower in Iceberg, so we would in most cases likely be ceding design to another project. I'm happy to take a wait and see approach on the more philosophical issue here and move forward on this (ultimately I think people doing the work should have more of a say on approach).
PR #11302 contains a sample implementation of this spec.
I’m not sure it’s worth drawing a line in the sand over this particular issue and I’d like to talk about it a bit more as a community before we merge this. I don’t want to set a precedent of adding write requirements to the Iceberg spec that aren’t actually requirements for Iceberg. I feel like if we make this a pattern we will essentially be deferring design decisions and I don’t really feel comfortable with that.
This is my main concern, I don't think the technical differences here really present blockers, they just add some warts. I also think compatibility between table formats is good goal, but I worry that due to governance differences between Iceberg and Delta, things naturally will go slower in Iceberg, so we would in most cases likely be ceding design to another project. I'm happy to take a wait and see approach on the more philosophical issue here and move forward on this (ultimately I think people doing the work should have more of a say on approach).
I agree that we don't want to cede design to another project and not set a precedent. This should be an independent choice of whether we want to maintain compatibility in this case, based on weighing the benefits against the costs. This is definitely an Iceberg community decision.
To me, reducing fragmentation across formats is worth the cost of a few warts.
I think that compatibility with other table formats is a great goal but I do want to stress that I value our ability to read other formats much higher than our ability to write other formats.
This is true, but I'm not sure that we've had a case before where we know that we want to build basically the same thing. And in this case, if we want compatibility with existing code, then we would need to make sure we write the fields to keep the other readers functioning.
I also think that if the community roles were reversed in a similar situation, we would want the Delta community to consider compatibility when building a very similar feature, too.
The vote has passed, so I merged this PR. Thanks @rdblue! Thanks everyone who reviewed!
Linking this PR to #11122 for tracking.