doris icon indicating copy to clipboard operation
doris copied to clipboard

[enhance](catalog)show partitions support iceberg

Open zddr opened this issue 1 year ago • 7 comments

Proposed changes

Issue Number: close #xxx

support show partitions from iceberg_table Field Explanation Reference: https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/PartitionsTable.java

mysql> show partitions from sample_cow_parquet\G
*************************** 1. row ***************************
                partition: col_timestamp_day=1970-01-04/city=Hefei/id_bucket=1
                   specId: 0
              recordCount: 36
                fileCount: 1
 totalDataFileSizeInBytes: 7717
positionDeleteRecordCount: 0
  positionDeleteFileCount: 0
equalityDeleteRecordCount: 0
  equalityDeleteFileCount: 0
            lastUpdatedAt: 1697163408768000
    lastUpdatedSnapshotId: 1704258583567025327

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

zddr avatar May 24 '24 11:05 zddr

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.

doris-robot avatar May 24 '24 11:05 doris-robot

add use case in description

morrySnow avatar May 24 '24 11:05 morrySnow

run buildall

zddr avatar May 24 '24 11:05 zddr

TPC-H: Total hot run time: 40966 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 69ca8507ec953fbd1865d9783fbc7b6b7285dbc9, data reload: false

------ Round 1 ----------------------------------
q1	17991	4485	4416	4416
q2	2580	197	193	193
q3	11488	1240	1205	1205
q4	10522	842	833	833
q5	7863	2692	2778	2692
q6	220	140	135	135
q7	961	613	607	607
q8	9460	2089	2074	2074
q9	8768	6487	6474	6474
q10	8939	3704	3707	3704
q11	451	259	234	234
q12	435	227	216	216
q13	17773	3002	2976	2976
q14	263	219	218	218
q15	516	462	465	462
q16	503	374	389	374
q17	952	670	667	667
q18	8073	7521	7460	7460
q19	7409	1559	1479	1479
q20	659	304	314	304
q21	4912	4005	3967	3967
q22	343	276	279	276
Total cold run time: 121081 ms
Total hot run time: 40966 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4324	4230	4213	4213
q2	359	269	271	269
q3	2990	2757	2738	2738
q4	1886	1684	1575	1575
q5	5293	5282	5298	5282
q6	214	126	128	126
q7	2120	1776	1690	1690
q8	3168	3342	3278	3278
q9	8351	8362	8290	8290
q10	3895	3665	3710	3665
q11	569	493	479	479
q12	744	605	570	570
q13	16550	3003	2990	2990
q14	295	272	261	261
q15	505	463	481	463
q16	465	417	430	417
q17	1781	1505	1477	1477
q18	7687	7638	7461	7461
q19	1643	1616	1575	1575
q20	1983	1795	1771	1771
q21	4845	4650	4786	4650
q22	555	501	496	496
Total cold run time: 70222 ms
Total hot run time: 53736 ms

doris-robot avatar May 24 '24 12:05 doris-robot

TPC-DS: Total hot run time: 167681 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 69ca8507ec953fbd1865d9783fbc7b6b7285dbc9, data reload: false

query1	917	378	365	365
query2	6452	2457	2351	2351
query3	6659	201	205	201
query4	19175	17485	17309	17309
query5	4132	414	420	414
query6	249	165	155	155
query7	4585	296	300	296
query8	237	185	186	185
query9	8498	2442	2400	2400
query10	458	288	280	280
query11	10593	10135	9990	9990
query12	134	89	87	87
query13	1629	355	363	355
query14	9990	6162	5872	5872
query15	204	177	173	173
query16	7194	278	259	259
query17	1358	542	535	535
query18	1791	267	268	267
query19	201	151	158	151
query20	96	84	83	83
query21	196	134	130	130
query22	4126	4020	3936	3936
query23	33431	32949	33078	32949
query24	11370	2826	2868	2826
query25	649	350	356	350
query26	1613	154	153	153
query27	3104	311	327	311
query28	7726	2064	2054	2054
query29	1074	620	582	582
query30	285	173	175	173
query31	971	734	733	733
query32	90	53	52	52
query33	769	263	259	259
query34	1026	457	471	457
query35	743	632	607	607
query36	1061	917	951	917
query37	153	70	73	70
query38	2881	2759	2785	2759
query39	853	793	780	780
query40	282	128	126	126
query41	49	45	49	45
query42	102	98	124	98
query43	594	562	526	526
query44	1157	734	744	734
query45	180	161	159	159
query46	1063	726	715	715
query47	1866	1785	1796	1785
query48	378	286	292	286
query49	1159	402	385	385
query50	776	389	397	389
query51	6685	6668	6520	6520
query52	102	91	88	88
query53	355	293	287	287
query54	952	424	412	412
query55	74	74	74	74
query56	259	244	263	244
query57	1140	1020	1066	1020
query58	236	239	216	216
query59	3434	3179	3104	3104
query60	275	254	258	254
query61	89	88	88	88
query62	657	453	452	452
query63	306	293	282	282
query64	9950	2240	1782	1782
query65	3173	3117	3132	3117
query66	1412	339	321	321
query67	15414	15000	14715	14715
query68	4582	544	543	543
query69	438	274	260	260
query70	1166	1085	1077	1077
query71	415	266	257	257
query72	7650	2693	2539	2539
query73	704	319	317	317
query74	5986	5618	5634	5618
query75	3333	2667	2602	2602
query76	2493	1041	962	962
query77	388	262	262	262
query78	10247	9594	9709	9594
query79	2084	521	511	511
query80	1181	437	431	431
query81	518	245	243	243
query82	645	92	96	92
query83	275	168	170	168
query84	236	84	85	84
query85	1691	265	268	265
query86	482	313	286	286
query87	3250	3116	3225	3116
query88	4210	2332	2332	2332
query89	484	398	392	392
query90	1974	190	183	183
query91	133	107	109	107
query92	55	48	51	48
query93	2352	517	502	502
query94	1227	197	193	193
query95	408	368	306	306
query96	588	268	267	267
query97	3227	3019	3084	3019
query98	244	224	220	220
query99	1221	835	869	835
Total cold run time: 272679 ms
Total hot run time: 167681 ms

doris-robot avatar May 24 '24 12:05 doris-robot

ClickBench: Total hot run time: 30.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 69ca8507ec953fbd1865d9783fbc7b6b7285dbc9, data reload: false

query1	0.04	0.04	0.03
query2	0.09	0.04	0.05
query3	0.23	0.05	0.05
query4	1.68	0.07	0.07
query5	0.48	0.51	0.50
query6	1.13	0.73	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.52	0.50	0.50
query10	0.55	0.56	0.54
query11	0.15	0.11	0.12
query12	0.15	0.12	0.12
query13	0.61	0.58	0.60
query14	0.78	0.78	0.80
query15	0.83	0.80	0.80
query16	0.36	0.36	0.37
query17	1.04	1.02	1.02
query18	0.22	0.24	0.26
query19	1.74	1.65	1.74
query20	0.02	0.01	0.00
query21	15.58	0.66	0.66
query22	4.79	7.52	1.69
query23	18.29	1.38	1.25
query24	1.74	0.24	0.25
query25	0.14	0.09	0.09
query26	0.26	0.17	0.16
query27	0.08	0.08	0.08
query28	13.30	1.02	1.00
query29	13.12	3.31	3.35
query30	0.24	0.06	0.05
query31	2.86	0.39	0.38
query32	3.30	0.48	0.47
query33	2.91	2.97	2.87
query34	16.91	4.43	4.42
query35	4.51	4.55	4.48
query36	0.65	0.46	0.47
query37	0.18	0.16	0.15
query38	0.16	0.14	0.14
query39	0.04	0.03	0.03
query40	0.15	0.14	0.14
query41	0.09	0.05	0.05
query42	0.05	0.04	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.08 s
Total hot run time: 30.37 s

doris-robot avatar May 24 '24 12:05 doris-robot

Nice. btw should we follow the iceberg semi-convention and support metadata table as well (e.g. trino select * from table$partitions or spark select * from prod.db.table.partitions https://iceberg.apache.org/docs/1.5.1/spark-queries/#partitions) in a separate pr

Samrose-Ahmed avatar May 25 '24 02:05 Samrose-Ahmed