materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Enforce non-zero group size hints

Open antiguru opened this issue 1 year ago • 3 comments

A group size hint of 0 isn't well-defined, so make it equivalent to disabling the hint. This is arguably an opinionated decision, but here it serves a single purpose: Make MirRelationExpr eight bytes smaller.

Before:

print-type-size type: `relation::MirRelationExpr`: 184 bytes, alignment: 8 bytes
print-type-size     variant `TopK`: 177 bytes
print-type-size         field `.expected_group_size`: 16 bytes
print-type-size         field `.limit`: 96 bytes
print-type-size         field `.group_key`: 24 bytes
print-type-size         field `.order_key`: 24 bytes
print-type-size         field `.input`: 8 bytes
print-type-size         field `.offset`: 8 bytes
print-type-size         field `.monotonic`: 1 bytes
print-type-size     variant `Join`: 176 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.implementation`: 120 bytes, alignment: 8 bytes
print-type-size         field `.inputs`: 24 bytes
print-type-size         field `.equivalences`: 24 bytes
print-type-size     variant `Constant`: 112 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.typ`: 48 bytes, alignment: 8 bytes
print-type-size         field `.rows`: 56 bytes
print-type-size     variant `FlatMap`: 112 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.exprs`: 24 bytes, alignment: 8 bytes
print-type-size         field `.func`: 72 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Get`: 96 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.id`: 16 bytes, alignment: 8 bytes
print-type-size         field `.typ`: 48 bytes
print-type-size         field `.access_strategy`: 24 bytes
print-type-size     variant `LetRec`: 88 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.ids`: 24 bytes, alignment: 8 bytes
print-type-size         field `.values`: 24 bytes
print-type-size         field `.limits`: 24 bytes
print-type-size         field `.body`: 8 bytes
print-type-size     variant `Reduce`: 81 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.expected_group_size`: 16 bytes, alignment: 8 bytes
print-type-size         field `.group_key`: 24 bytes
print-type-size         field `.aggregates`: 24 bytes
print-type-size         field `.input`: 8 bytes
print-type-size         field `.monotonic`: 1 bytes
print-type-size     variant `Project`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.outputs`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Map`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.scalars`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Filter`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.predicates`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Union`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.inputs`: 24 bytes, alignment: 8 bytes
print-type-size         field `.base`: 8 bytes
print-type-size     variant `ArrangeBy`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.keys`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Let`: 32 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.value`: 8 bytes, alignment: 8 bytes
print-type-size         field `.body`: 8 bytes
print-type-size         field `.id`: 8 bytes
print-type-size     variant `Negate`: 16 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.input`: 8 bytes, alignment: 8 bytes
print-type-size     variant `Threshold`: 16 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.input`: 8 bytes, alignment: 8 bytes
print-type-size     end padding: 7 bytes

After:

print-type-size type: `relation::MirRelationExpr`: 176 bytes, alignment: 8 bytes
print-type-size     variant `Join`: 176 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.implementation`: 120 bytes, alignment: 8 bytes
print-type-size         field `.inputs`: 24 bytes
print-type-size         field `.equivalences`: 24 bytes
print-type-size     variant `TopK`: 169 bytes
print-type-size         field `.limit`: 96 bytes
print-type-size         field `.group_key`: 24 bytes
print-type-size         field `.order_key`: 24 bytes
print-type-size         field `.input`: 8 bytes
print-type-size         field `.offset`: 8 bytes
print-type-size         field `.expected_group_size`: 8 bytes
print-type-size         field `.monotonic`: 1 bytes
print-type-size     variant `Constant`: 112 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.typ`: 48 bytes, alignment: 8 bytes
print-type-size         field `.rows`: 56 bytes
print-type-size     variant `FlatMap`: 112 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.exprs`: 24 bytes, alignment: 8 bytes
print-type-size         field `.func`: 72 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Get`: 96 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.id`: 16 bytes, alignment: 8 bytes
print-type-size         field `.typ`: 48 bytes
print-type-size         field `.access_strategy`: 24 bytes
print-type-size     variant `LetRec`: 88 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.ids`: 24 bytes, alignment: 8 bytes
print-type-size         field `.values`: 24 bytes
print-type-size         field `.limits`: 24 bytes
print-type-size         field `.body`: 8 bytes
print-type-size     variant `Reduce`: 73 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.group_key`: 24 bytes, alignment: 8 bytes
print-type-size         field `.aggregates`: 24 bytes
print-type-size         field `.input`: 8 bytes
print-type-size         field `.expected_group_size`: 8 bytes
print-type-size         field `.monotonic`: 1 bytes
print-type-size     variant `Project`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.outputs`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Map`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.scalars`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Filter`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.predicates`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Union`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.inputs`: 24 bytes, alignment: 8 bytes
print-type-size         field `.base`: 8 bytes
print-type-size     variant `ArrangeBy`: 40 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.keys`: 24 bytes, alignment: 8 bytes
print-type-size         field `.input`: 8 bytes
print-type-size     variant `Let`: 32 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.value`: 8 bytes, alignment: 8 bytes
print-type-size         field `.body`: 8 bytes
print-type-size         field `.id`: 8 bytes
print-type-size     variant `Negate`: 16 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.input`: 8 bytes, alignment: 8 bytes
print-type-size     variant `Threshold`: 16 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.input`: 8 bytes, alignment: 8 bytes

Checklist

  • [ ] This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • [ ] This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • [ ] If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • [ ] If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

antiguru avatar Oct 07 '24 08:10 antiguru

Why is zero group size not well defined? It's a valid size for a group

petrosagg avatar Oct 07 '24 09:10 petrosagg

(Moving this to draft as more discussions are needed! I'm not set in stone for this PR, it's just another small piece of chopping away on inflated memory usage.)

I agree that reserving 0 as a special value is odd and would like to avoid it. The group size hint has enough variants that we could use u64::MAX for example to encode None, without needing to change what 0 means. That's a change that'll need some thinking to get semantics right, but easy otherwise. Or, we offset the group size by one to enable the non-null optimizations...

I would not prefer to box the value because that would add the cost of an allocation plus the pointer, so even tho the struct would be 8 bytes smaller, we still would store 16 bytes (plus allocator overhead) on the heap.

antiguru avatar Oct 07 '24 10:10 antiguru

The group size hint has enough variants that we could use u64::MAX for example to encode None, without needing to change what 0 means. That's a change that'll need some thinking to get semantics right, but easy otherwise. Or, we offset the group size by one to enable the non-null optimizations...

Both of these options sound good to me!

ggevay avatar Oct 07 '24 11:10 ggevay