starrocks
starrocks copied to clipboard
[Feature]Support reading hudi MOR table in snapshot mode
What type of PR is this:
- [ ] bug
- [x] feature
- [ ] enhancement
- [ ] refactor
- [ ] others
Support reading hudi MOR table in snapshot mode
Notes for review: Reviewer should read comments in OffHeapTable.java to understand the memory layout design used to interact with C++ in BE.
/**
* We use off-heap memory to save Hudi MOR table data
* and a custom memory layout to be parsed by Starrocks BE written in C++.
*
* Off-heap table memory layout details:
* 1. A single data column is stored continuously in off-heap memory.
* 2. Different data columns are stored in different locations in off-heap memory.
* 3. Introduce null indicator columns to determine if a row of the related data column is empty.
* 4. Introduce a meta column to save the memory addresses of different data columns,
* the memory addresses of null indicator columns and number of rows.
*
* Meta column layout:
* Meta column start address: | number of rows |
* | null indicator start address of fixed length column-A |
* | data column start address of the fixed length column-A |
* | ... |
* | null indicator start address of variable length column-B |
* | offset column start address of the variable length column-B |
* | length column start address of the variable length column-B |
* | data column start address of the variable length column-B |
* | ... |
*
* Null indicator column layout:
* Null column start address: | 1-byte boolean | 1-byte boolean | 1-byte boolean | ... |
* Row index: -------row 0-------------row 1------------row 2----- ... -
*
* Data column layout:
* Data columns are divided into two storage types: fixed length column and variable length.
*
* For fixed length column like BOOLEAN/INT/LONG, we use first-level index addressing method.
* (1) Get data column start address from meta column.
* (2) Use column start address to read the data of fixed length.
* Fixed length column memory layout:
* Data column start address of fixed length column: | X-bytes | X-bytes | X-bytes | ... |
* INT column of 4 bytes for example:
* Fixed length column start address: | 4-bytes INT | 4-bytes INT | 4-bytes INT | ... |
* Row index: ----row 0---------row 1---------row 2----- ... -
*
*
* For variable length column like STRING/DECIMAL, we use secondary-level index addressing method.
* (1) Get offset column start address and length column start address from meta column.
* (2) Get the field start memory address from offset column at a row index.
* (3) Get the field length from length column at a row index.
* (4) Use the field start address and the field length to read the data of variable length.
* Variable length column memory layout:
* Offset column start address of variable length column: : | 4-bytes INT | 4-bytes INT | 4-bytes INT | ... |
* Length column start address of variable length column: : | 4-bytes INT | 4-bytes INT | 4-bytes INT | ... |
* Data column start address of variable length column: | X-bytes | Y-bytes | Z-bytes | ... |
* STRING column for example:
* Offset column start address: | 4-bytes INT | 4-bytes INT | 4-bytes INT | ... |
* Row index: ----row 0---------row 1---------row 2----- ... -
* Length column start address: | 4-bytes INT | 4-bytes INT | 4-bytes INT | ... |
* Row index: ----row 0---------row 1---------row 2----- ... -
* Variable length column start address: | (length of row 0)-bytes | (length of row 1)-bytes | ... |
* | |
* column start address + offset of row 0 column start address + offset of row 0 + length of row 0
*/
run starrocks_be_unittest
[FE PR Coverage check]
:disappointed: fail : 6 / 73 (08.22%)
file detail
| path | covered line | new line | coverage | |
|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/external/RemoteScanRangeLocations.java | 0 | 41 | 00.00% |
| :large_blue_circle: | com/starrocks/external/hive/HiveMetaClient.java | 2 | 19 | 10.53% |
| :large_blue_circle: | com/starrocks/catalog/HudiTable.java | 1 | 8 | 12.50% |
| :large_blue_circle: | com/starrocks/external/hive/HdfsFileDesc.java | 3 | 5 | 60.00% |
[FE PR Coverage Check]
:disappointed: fail : 40 / 124 (32.26%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/qe/Coordinator.java | 0 | 6 | 00.00% | [2970, 2971, 2972, 2973, 2974, 2975] |
| :large_blue_circle: | com/starrocks/external/RemoteScanRangeLocations.java | 0 | 42 | 00.00% | [119, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 138, 139, 141, 142, 162, 163, 164, 165, 166, 167, 169, 170, 171, 172, 173, 174, 177, 178, 179, 180, 181, 182, 183, 185, 189, 190] |
| :large_blue_circle: | com/starrocks/external/hive/HiveMetaClient.java | 2 | 19 | 10.53% | [322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338] |
| :large_blue_circle: | com/starrocks/external/hive/HdfsFileDesc.java | 3 | 5 | 60.00% | [56, 60] |
| :large_blue_circle: | com/starrocks/catalog/HudiTable.java | 35 | 52 | 67.31% | [140, 280, 281, 285, 292, 293, 295, 349, 560, 562, 563, 564, 565, 566, 567, 568, 569] |
run starrocks_admit_test
run starrocks_admit_test
run starrocks_admit_test
run starrocks_admit_test







