starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] introduce RowIdColumn

Open silverbullet233 opened this issue 5 months ago • 4 comments

Why I'm doing:

This is the first part of #60015. We need a method to uniquely identify each row of data in order to implement delayed reading.

What I'm doing:

I introduce a new data type, ROW_ID, used to globally identify a row. The ROW_ID is composed of three parts:

  • be_id: identifies a specific BE node
  • seg_id: identifies a specific segment.
  • ord_id: identifies the position of the data within the segment

By using these three components, we can locate a specific row of data.

In this PR, I have only implemented some basic functionality for ROW_ID, including RowIdColumn, the in-memory representation of ROW_ID, and GlobalRowIdColumnIterator, which is used to return the corresponding rowid for each row. Subsequent PRs will depend on these components to implement global late materialization.

Fixes #60015

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

silverbullet233 avatar Jun 17 '25 05:06 silverbullet233

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Jun 18 '25 07:06 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Jun 18 '25 07:06 github-actions[bot]

[BE Incremental Coverage Report]

:x: fail : 127 / 363 (34.99%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/types/logical_type.cpp 0 4 00.00% [150, 151, 227, 228]
:large_blue_circle: be/src/exec/sorted_streaming_aggregator.cpp 0 4 00.00% [157, 158, 257, 258]
:large_blue_circle: be/src/column/column_visitor_mutable.cpp 0 1 00.00% [74]
:large_blue_circle: be/src/column/schema.cpp 0 5 00.00% [292, 293, 294, 295, 296]
:large_blue_circle: be/src/types/row_id_type_info.cpp 0 16 00.00% [24, 25, 27, 29, 31, 33, 34, 37, 39, 41, 43, 45, 48, 49, 54, 55]
:large_blue_circle: be/src/runtime/types.cpp 0 2 00.00% [373, 374]
:large_blue_circle: be/src/exec/sorting/sort_permute.cpp 0 23 00.00% [238, 239, 240, 242, 243, 244, 245, 247, 248, 249, 250, 252, 253, 254, 255, 256, 257, 258, 259, 263, 264, 267, 268]
:large_blue_circle: be/src/storage/tablet_schema.cpp 0 1 00.00% [90]
:large_blue_circle: be/src/column/column_visitor.cpp 0 1 00.00% [74]
:large_blue_circle: be/src/exec/sorting/compare_column.cpp 0 6 00.00% [242, 243, 244, 321, 322, 323]
:large_blue_circle: be/src/column/column.h 0 1 00.00% [111]
:large_blue_circle: be/src/exec/sorting/sort_column.cpp 0 6 00.00% [221, 222, 223, 415, 416, 417]
:large_blue_circle: be/src/column/row_id_column.h 8 94 08.51% [54, 56, 57, 58, 61, 62, 63, 67, 68, 69, 70, 71, 73, 74, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 93, 95, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 123, 124, 134, 135, 136, 138, 139, 140, 143, 144, 145, 147, 148, 149, 152, 153, 154, 157, 158, 159, 164, 168, 170, 171, 174, 175, 176, 179, 186, 195, 201, 202, 203, 204, 207, 209, 211, 215, 216, 218, 219, 220, 221, 222]
:large_blue_circle: be/src/storage/chunk_helper.cpp 2 9 22.22% [279, 300, 393, 435, 436, 790, 791]
:large_blue_circle: be/src/column/row_id_column.cpp 68 138 49.28% [29, 30, 33, 40, 41, 86, 87, 88, 90, 91, 93, 106, 107, 108, 109, 110, 113, 114, 115, 116, 117, 119, 121, 122, 124, 126, 127, 128, 129, 130, 132, 133, 134, 135, 137, 139, 140, 143, 144, 145, 146, 147, 148, 149, 152, 153, 154, 155, 156, 159, 160, 161, 162, 163, 164, 165, 166, 175, 176, 177, 178, 179, 180, 206, 207, 208, 209, 210, 211, 212]
:large_blue_circle: be/src/storage/types.cpp 1 2 50.00% [346]
:large_blue_circle: be/src/column/chunk.cpp 28 30 93.33% [417, 450]
:large_blue_circle: be/src/types/logical_type_infra.h 2 2 100.00% []
:large_blue_circle: be/src/column/column_visitor_adapter.h 2 2 100.00% []
:large_blue_circle: be/src/storage/olap_type_infra.h 1 1 100.00% []
:large_blue_circle: be/src/serde/column_array_serde.cpp 15 15 100.00% []

github-actions[bot] avatar Jun 18 '25 07:06 github-actions[bot]