doris
doris copied to clipboard
[feature](iceberg catalog) support iceberg view query
What problem does this PR solve?
This feature addresses the issue that Doris cannot query Iceberg views.
- Prerequisite Dependencies Iceberg version 1.7.x+
- Design Approach
- Enhanced View Loading in IcebergMetadataCache:
Added view loading capability during loadTable operations for Iceberg.
Introduced getIcebergView() method to retrieve Iceberg views.
- New Methods in IcebergExternalTable:
isView(): Determines if the current Iceberg table is a view.
getViewText(): Retrieves the SQL definition query of the view.
- Execution Plan Generation Adjustment:
Checks isView() during plan generation to identify Iceberg views.
For views, generates a logical subquery execution plan by flattening the view into a subquery.
- Cache Invalidation Enhancement:
Added view cache invalidation in IcebergMetadataCache when invalidating catalog/database/table caches.
Problem Summary:
This PR aims to resolve the issue where querying an Iceberg view in Doris results in an error indicating the table does not exist. Specifically, it addresses the problem that Doris currently cannot recognize and query Iceberg views, causing exceptions when users attempt to access view data.
Check List (For Author)
Test (At least one of them must be included): Manual test scripts and report as below: 回归测试用例.docx
Behavior changed:
Yes.
- BindRelation.java
Modified getLogicalPlan method: Adjusted logic for Iceberg external tables to generate subquery execution plans when encountering views. Renamed parseAndAnalyzeHiveView to parseAndAnalyzeExternalView: Unified handling for external views (Hive/Iceberg).
-
Config.java
(1) Added enable_query_iceberg_views: Configuration flag to enable/disable Iceberg view query support.
-
IcebergExternalTable.java
(1) Enhanced initSchema method: Added schema initialization compatibility for both Iceberg tables and views.
(2) New getViewText method: Retrieves the SQL definition of an Iceberg view.
-
IcebergMetadataCache.java
(1) Constructor Initialization: Added cache loading for Iceberg views.
(2) Invalidation Methods: Extended invalidateCatalogCache, invalidateTableCache, invalidateDbCache to include view cache invalidation.
New Methods:
(1) loadView: Loads Iceberg view metadata.
(2) getIcebergView: Retrieves cached Iceberg view information.
-
IcebergMetadataOps.java
(1) Modified listTableNames: Returns combined list of tables and views.
New Methods:
(1) viewExists: Checks if an Iceberg view exists.
(2) loadView: Loads Iceberg view metadata.
-
IcebergUtils.java
Modified Methods:
(1) getSchema: Supports schema retrieval for both tables and views.
(2) loadSchemaCacheValue: Handles schema caching for both types.
New Methods:
(1) getIcebergView/getIcebergViewInternal: Retrieve Iceberg view objects.
(2) getConvertedSchema: Converts Iceberg schema to Doris-compatible format.
-
ExternalMetadataOps.java
New Interface Methods:
(1) loadView: Load external view metadata.
(2) - viewExists: Check view existence.
(3) listViewNames: List available views.
-
IcebergExternalCatalog.java
(1) Modified listTableNames: Returns combined tables and views.
(2) Overridden viewExists: Implements view existence check.
-
ExternalCatalog.java
(1) New Interface Method: viewExists to check view existence.
-
ExternalTable.java
New Methods:
(1) isTable: Determines if the object is a table (vs. view).
(2) getRemoteDbName: Retrieves the remote database name.
-
HMSExternalTable.java
Modified Methods:
(1) initSchema: Added view compatibility.
(2) getIcebergSchema: Supports both tables and views.
(3) Overridden isTable: Implements table type check.
-
IcebergApiSource.java
(1) Constructor Adjustment: Added view type validation to prevent invalid operations.
-
IcebergTableSink.java
(1) Constructor Adjustment: Added view type validation to prevent invalid operations.
Does this need documentation? No.
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.
Please clearly describe your PR:
- What problem was fixed (it's best to include specific error reporting information). How it was fixed.
- Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
- What features were added. Why was this function added?
- Which code was refactored and why was this part of the code refactored?
- Which functions were optimized and what is the difference before and after the optimization?
And please add necessary UT and regression test for these feature.
run buildall
Cloud UT Coverage Report
Increment line coverage :tada:
Increment coverage report Complete coverage report
| Category | Coverage |
|---|---|
| Function Coverage | 83.33% (1120/1344) |
| Line Coverage | 66.82% (19318/28909) |
| Region Coverage | 66.53% (9574/14390) |
| Branch Coverage | 56.53% (5210/9216) |
run buildall
which TABLE_TYPE will return by iceberg's view when query
information.tables?
run buildall
which TABLE_TYPE will return by iceberg's view when query
information.tables?
it will return iceberg, same as iceberg table.
And also, for hive view, it return hive, same as hive table.
I will rethink of this logic since currently there is no standard
TPC-H: Total hot run time: 34056 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ff0969c50c761c36cd31b4690d39acd0d0e997b5, data reload: false
------ Round 1 ----------------------------------
q1 17639 5220 4989 4989
q2 1920 295 200 200
q3 10293 1306 738 738
q4 10234 1000 521 521
q5 7537 2344 2384 2344
q6 179 180 131 131
q7 904 751 606 606
q8 9324 1525 1128 1128
q9 6812 5116 5142 5116
q10 6897 2385 1958 1958
q11 499 280 271 271
q12 350 356 216 216
q13 17756 3652 3078 3078
q14 241 240 225 225
q15 542 480 470 470
q16 428 432 368 368
q17 591 851 371 371
q18 7579 7242 7127 7127
q19 1218 942 558 558
q20 337 337 224 224
q21 3760 3145 2443 2443
q22 1101 1030 974 974
Total cold run time: 106141 ms
Total hot run time: 34056 ms
----- Round 2, with runtime_filter_mode=off -----
q1 5462 5035 5041 5035
q2 249 324 226 226
q3 2150 2667 2289 2289
q4 1333 1798 1349 1349
q5 4259 4092 4447 4092
q6 209 167 130 130
q7 2022 1910 1780 1780
q8 2628 2533 2548 2533
q9 7143 7188 7083 7083
q10 3034 3293 2888 2888
q11 570 507 489 489
q12 686 760 625 625
q13 3446 3856 3325 3325
q14 295 294 264 264
q15 527 479 480 479
q16 435 477 449 449
q17 1177 1481 1421 1421
q18 7766 7610 7384 7384
q19 812 825 884 825
q20 1974 1980 1888 1888
q21 4989 4491 4335 4335
q22 1092 1061 994 994
Total cold run time: 52258 ms
Total hot run time: 49883 ms
TPC-DS: Total hot run time: 185762 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ff0969c50c761c36cd31b4690d39acd0d0e997b5, data reload: false
query1 999 394 382 382
query2 6532 1832 1866 1832
query3 6745 219 232 219
query4 26341 23481 23509 23481
query5 4353 628 467 467
query6 317 226 199 199
query7 4637 486 288 288
query8 267 234 223 223
query9 8647 2637 2648 2637
query10 491 358 279 279
query11 15802 15287 14868 14868
query12 161 113 107 107
query13 1664 520 412 412
query14 10063 6123 6102 6102
query15 200 190 176 176
query16 7204 654 496 496
query17 1206 726 592 592
query18 1996 436 310 310
query19 200 190 171 171
query20 128 122 130 122
query21 220 133 108 108
query22 4062 4135 3990 3990
query23 34115 33102 33023 33023
query24 7747 2383 2365 2365
query25 551 459 393 393
query26 1244 267 151 151
query27 2645 478 345 345
query28 4379 2138 2114 2114
query29 762 576 425 425
query30 280 222 192 192
query31 946 870 762 762
query32 73 65 64 64
query33 568 372 317 317
query34 805 829 529 529
query35 778 822 745 745
query36 940 1021 883 883
query37 111 96 83 83
query38 4076 4079 4036 4036
query39 1499 1429 1416 1416
query40 214 123 110 110
query41 64 60 61 60
query42 134 108 108 108
query43 503 504 480 480
query44 1319 812 825 812
query45 182 167 168 167
query46 847 1016 620 620
query47 1782 1804 1706 1706
query48 398 412 330 330
query49 747 490 395 395
query50 628 674 405 405
query51 4101 4142 4197 4142
query52 111 109 96 96
query53 218 248 182 182
query54 576 569 505 505
query55 82 86 81 81
query56 307 295 278 278
query57 1193 1200 1130 1130
query58 278 266 249 249
query59 2697 2736 2641 2641
query60 322 317 302 302
query61 126 156 126 126
query62 797 717 685 685
query63 229 191 191 191
query64 4388 1036 667 667
query65 4250 4175 4229 4175
query66 1161 406 320 320
query67 15662 15446 15218 15218
query68 8254 877 511 511
query69 472 298 271 271
query70 1189 1117 1082 1082
query71 485 336 308 308
query72 5698 4725 4647 4647
query73 696 583 353 353
query74 9209 9071 8721 8721
query75 3972 3191 2711 2711
query76 3725 1200 764 764
query77 869 369 291 291
query78 10034 10053 9329 9329
query79 2664 784 575 575
query80 627 537 441 441
query81 501 265 234 234
query82 684 131 101 101
query83 294 244 236 236
query84 293 113 93 93
query85 781 352 364 352
query86 395 311 309 309
query87 4393 4446 4450 4446
query88 3678 2279 2272 2272
query89 393 310 287 287
query90 1846 210 206 206
query91 148 147 115 115
query92 75 66 58 58
query93 1753 935 572 572
query94 669 414 304 304
query95 373 297 284 284
query96 509 560 284 284
query97 2719 2812 2608 2608
query98 244 205 206 205
query99 1460 1421 1307 1307
Total cold run time: 276172 ms
Total hot run time: 185762 ms
ClickBench: Total hot run time: 29.74 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ff0969c50c761c36cd31b4690d39acd0d0e997b5, data reload: false
query1 0.04 0.04 0.03
query2 0.07 0.04 0.04
query3 0.23 0.07 0.07
query4 1.62 0.10 0.10
query5 0.45 0.42 0.42
query6 1.20 0.64 0.67
query7 0.03 0.01 0.01
query8 0.05 0.04 0.04
query9 0.56 0.52 0.51
query10 0.57 0.58 0.56
query11 0.15 0.11 0.11
query12 0.15 0.12 0.12
query13 0.63 0.61 0.62
query14 0.80 0.82 0.85
query15 0.89 0.88 0.86
query16 0.40 0.37 0.38
query17 1.10 1.04 1.04
query18 0.23 0.22 0.21
query19 1.93 1.85 1.86
query20 0.02 0.01 0.01
query21 15.40 0.90 0.56
query22 0.77 1.22 0.66
query23 14.90 1.37 0.63
query24 7.40 1.43 0.91
query25 0.50 0.21 0.08
query26 0.61 0.16 0.14
query27 0.07 0.05 0.05
query28 9.69 0.87 0.45
query29 12.56 4.02 3.34
query30 0.26 0.09 0.06
query31 2.83 0.60 0.40
query32 3.26 0.56 0.48
query33 3.15 3.12 3.09
query34 16.00 5.38 4.81
query35 4.85 4.83 4.90
query36 0.68 0.50 0.49
query37 0.08 0.06 0.06
query38 0.05 0.04 0.04
query39 0.03 0.02 0.02
query40 0.18 0.15 0.15
query41 0.09 0.02 0.02
query42 0.03 0.03 0.02
query43 0.04 0.04 0.03
Total cold run time: 104.55 s
Total hot run time: 29.74 s
FE UT Coverage Report
Increment line coverage 4.29% (9/210) :tada:
Increment coverage report
Complete coverage report
run buildall
TPC-H: Total hot run time: 33918 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 79097b1953e6a9f086a82d25d8683f1abf48adfe, data reload: false
------ Round 1 ----------------------------------
q1 17583 5231 4997 4997
q2 1929 279 187 187
q3 10380 1284 728 728
q4 10287 1005 523 523
q5 8399 2450 2339 2339
q6 187 162 132 132
q7 922 751 632 632
q8 9346 1334 1064 1064
q9 6808 5037 5079 5037
q10 6887 2384 1955 1955
q11 494 297 277 277
q12 343 353 212 212
q13 17775 3708 3134 3134
q14 231 227 212 212
q15 564 474 479 474
q16 424 440 378 378
q17 583 860 350 350
q18 7689 7128 7128 7128
q19 1650 942 563 563
q20 338 340 216 216
q21 3872 3191 2433 2433
q22 1051 1017 947 947
Total cold run time: 107742 ms
Total hot run time: 33918 ms
----- Round 2, with runtime_filter_mode=off -----
q1 5086 5030 5071 5030
q2 242 325 223 223
q3 2197 2705 2312 2312
q4 1346 1872 1355 1355
q5 4261 4129 4491 4129
q6 219 171 129 129
q7 2030 1934 1785 1785
q8 2654 2634 2503 2503
q9 7146 7099 7154 7099
q10 3110 3334 2809 2809
q11 588 517 502 502
q12 690 773 638 638
q13 3494 3910 3281 3281
q14 287 297 268 268
q15 509 483 471 471
q16 455 487 431 431
q17 1143 1577 1345 1345
q18 7828 7452 7314 7314
q19 794 794 978 794
q20 1981 2132 1892 1892
q21 5445 4520 4183 4183
q22 1018 1001 973 973
Total cold run time: 52523 ms
Total hot run time: 49466 ms
TPC-DS: Total hot run time: 187047 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 79097b1953e6a9f086a82d25d8683f1abf48adfe, data reload: false
query1 1006 401 393 393
query2 6542 1929 1931 1929
query3 6750 229 221 221
query4 26337 24092 23710 23710
query5 4343 643 497 497
query6 306 214 203 203
query7 4634 495 305 305
query8 270 237 220 220
query9 8613 2650 2660 2650
query10 483 338 273 273
query11 15882 15152 14874 14874
query12 171 111 110 110
query13 1658 538 409 409
query14 9068 6223 6019 6019
query15 208 198 186 186
query16 7164 608 461 461
query17 1192 732 567 567
query18 1981 402 314 314
query19 192 190 182 182
query20 129 122 114 114
query21 214 121 117 117
query22 4079 4083 4043 4043
query23 34013 33095 33124 33095
query24 8431 2371 2434 2371
query25 564 490 402 402
query26 1220 266 153 153
query27 2764 510 352 352
query28 4318 2139 2122 2122
query29 780 560 455 455
query30 288 223 194 194
query31 925 855 746 746
query32 74 63 93 63
query33 546 365 315 315
query34 808 856 559 559
query35 787 810 767 767
query36 973 998 886 886
query37 111 99 82 82
query38 4093 4252 4000 4000
query39 1475 1415 1412 1412
query40 215 118 107 107
query41 63 61 59 59
query42 132 111 123 111
query43 510 528 491 491
query44 1362 822 824 822
query45 205 173 171 171
query46 850 1039 633 633
query47 1755 1755 1717 1717
query48 386 440 318 318
query49 768 480 401 401
query50 682 689 407 407
query51 4172 4264 4011 4011
query52 114 108 97 97
query53 226 255 180 180
query54 577 574 508 508
query55 88 80 83 80
query56 300 305 299 299
query57 1183 1204 1130 1130
query58 277 258 266 258
query59 2641 2727 2733 2727
query60 338 332 317 317
query61 130 124 126 124
query62 800 729 648 648
query63 238 197 197 197
query64 4458 1102 765 765
query65 4284 4216 4216 4216
query66 1171 410 311 311
query67 15849 15663 15470 15470
query68 7766 885 532 532
query69 472 318 265 265
query70 1151 1133 1136 1133
query71 429 335 293 293
query72 5511 4783 4792 4783
query73 649 633 351 351
query74 9270 9174 8785 8785
query75 3508 3213 2736 2736
query76 3372 1212 785 785
query77 751 420 313 313
query78 10064 10098 9365 9365
query79 2292 858 595 595
query80 599 519 454 454
query81 501 273 227 227
query82 449 131 101 101
query83 258 255 240 240
query84 248 112 91 91
query85 798 348 318 318
query86 383 306 279 279
query87 4485 4577 4430 4430
query88 3981 2280 2286 2280
query89 387 353 287 287
query90 1853 210 213 210
query91 141 146 116 116
query92 79 66 61 61
query93 1875 958 594 594
query94 695 416 314 314
query95 375 299 293 293
query96 501 575 283 283
query97 2744 2766 2686 2686
query98 238 217 205 205
query99 1469 1383 1296 1296
Total cold run time: 274460 ms
Total hot run time: 187047 ms
FE UT Coverage Report
Increment line coverage 4.57% (9/197) :tada:
Increment coverage report
Complete coverage report
run buildall
TPC-H: Total hot run time: 34185 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f105dffb4303ac09a24b860f39a5338d1bf55837, data reload: false
------ Round 1 ----------------------------------
q1 17624 5195 5096 5096
q2 1938 308 197 197
q3 10257 1303 731 731
q4 10217 1012 533 533
q5 7497 2363 2369 2363
q6 185 160 137 137
q7 899 770 601 601
q8 9315 1296 1164 1164
q9 6625 5070 5115 5070
q10 6917 2376 1964 1964
q11 496 290 278 278
q12 351 344 219 219
q13 17785 3669 3112 3112
q14 242 234 218 218
q15 553 476 482 476
q16 439 425 380 380
q17 624 905 386 386
q18 7735 7177 7012 7012
q19 1615 959 591 591
q20 345 338 234 234
q21 4052 3358 2452 2452
q22 1032 1039 971 971
Total cold run time: 106743 ms
Total hot run time: 34185 ms
----- Round 2, with runtime_filter_mode=off -----
q1 5236 5126 5178 5126
q2 249 327 223 223
q3 2258 2700 2359 2359
q4 1353 1821 1376 1376
q5 4204 4161 4390 4161
q6 256 179 137 137
q7 1975 1943 1758 1758
q8 2637 2592 2538 2538
q9 7063 6976 7035 6976
q10 3092 3268 2816 2816
q11 581 504 498 498
q12 702 758 613 613
q13 3949 3928 3320 3320
q14 278 304 281 281
q15 519 475 465 465
q16 435 501 433 433
q17 1147 1504 1373 1373
q18 7284 7028 6942 6942
q19 786 792 943 792
q20 1973 1928 1802 1802
q21 4739 4335 4281 4281
q22 1078 1065 998 998
Total cold run time: 51794 ms
Total hot run time: 49268 ms
TPC-DS: Total hot run time: 185875 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f105dffb4303ac09a24b860f39a5338d1bf55837, data reload: false
query1 979 404 391 391
query2 6511 1884 1873 1873
query3 6744 231 224 224
query4 26463 23718 23031 23031
query5 4383 652 488 488
query6 306 215 211 211
query7 4629 487 305 305
query8 268 232 214 214
query9 8623 2623 2640 2623
query10 479 328 284 284
query11 15754 15190 14973 14973
query12 181 116 107 107
query13 1656 549 438 438
query14 9781 6134 6294 6134
query15 205 198 173 173
query16 7643 632 448 448
query17 1195 686 569 569
query18 2005 404 300 300
query19 190 187 157 157
query20 117 113 115 113
query21 216 133 109 109
query22 3983 4255 4053 4053
query23 34025 32986 33217 32986
query24 8392 2368 2413 2368
query25 535 469 395 395
query26 1231 278 154 154
query27 2742 515 374 374
query28 4316 2113 2085 2085
query29 738 541 438 438
query30 283 220 197 197
query31 928 829 754 754
query32 69 69 61 61
query33 561 391 339 339
query34 806 849 543 543
query35 802 822 764 764
query36 959 1004 862 862
query37 114 107 83 83
query38 4072 4086 4040 4040
query39 1489 1423 1390 1390
query40 217 121 118 118
query41 64 61 63 61
query42 132 111 113 111
query43 496 519 482 482
query44 1360 847 829 829
query45 182 172 169 169
query46 864 1040 627 627
query47 1728 1755 1675 1675
query48 385 425 317 317
query49 732 485 410 410
query50 662 693 423 423
query51 4108 4156 4141 4141
query52 116 111 108 108
query53 231 261 191 191
query54 582 585 527 527
query55 90 92 91 91
query56 325 307 321 307
query57 1211 1176 1117 1117
query58 260 259 257 257
query59 2686 2713 2583 2583
query60 333 332 317 317
query61 130 123 121 121
query62 817 709 683 683
query63 234 196 187 187
query64 4243 1026 681 681
query65 4225 4177 4179 4177
query66 1084 410 335 335
query67 15730 15447 15331 15331
query68 6550 899 539 539
query69 491 305 280 280
query70 1197 1108 1111 1108
query71 419 337 314 314
query72 5153 4655 4564 4564
query73 622 573 357 357
query74 9331 9094 8977 8977
query75 3180 3194 2733 2733
query76 3275 1192 780 780
query77 483 383 293 293
query78 10023 10176 9256 9256
query79 1700 784 599 599
query80 675 502 461 461
query81 510 265 220 220
query82 184 128 97 97
query83 257 251 237 237
query84 254 111 91 91
query85 754 363 360 360
query86 356 331 298 298
query87 4364 4506 4344 4344
query88 2910 2280 2306 2280
query89 399 313 280 280
query90 1738 210 209 209
query91 145 144 115 115
query92 68 62 58 58
query93 1040 945 599 599
query94 624 416 319 319
query95 381 288 295 288
query96 492 568 284 284
query97 2704 2760 2645 2645
query98 233 204 210 204
query99 1323 1399 1259 1259
Total cold run time: 269435 ms
Total hot run time: 185875 ms
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.