doris
doris copied to clipboard
[improve](simd-json-reader) fix simd json reader lose data and support stream parser
Proposed changes
When load json with do not set read_json_by_line, only one json loaded.
But there more than one json, means lose data when load json.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
TeamCity be ut coverage result: Function Coverage: 36.75% (8410/22883) Line Coverage: 29.27% (68395/233666) Region Coverage: 27.86% (35344/126882) Branch Coverage: 24.62% (18064/73366) Coverage Report: http://coverage.selectdb-in.cc/coverage/8b468bcb5170c80a8af43123c05a08d02e27b480_8b468bcb5170c80a8af43123c05a08d02e27b480/report/index.html
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 8b468bcb5170c80a8af43123c05a08d02e27b480, data reload: false
run tpch-sf100 query with default conf and session variables
q1 4950 4696 4656 4656
q2 353 161 159 159
q3 2018 1897 1876 1876
q4 1392 1263 1229 1229
q5 3951 3939 4090 3939
q6 260 128 130 128
q7 1394 876 879 876
q8 2781 2787 2763 2763
q9 9698 9648 9630 9630
q10 3473 3506 3540 3506
q11 370 252 248 248
q12 438 295 303 295
q13 4545 3823 3815 3815
q14 336 290 288 288
q15 587 556 521 521
q16 663 581 576 576
q17 1133 962 955 955
q18 7718 7304 7427 7304
q19 1676 1682 1682 1682
q20 529 315 298 298
q21 4456 3949 4018 3949
q22 481 366 367 366
Total cold run time: 53202 ms
Total hot run time: 49059 ms
run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1 4625 4608 4566 4566
q2 335 248 276 248
q3 4046 4009 3998 3998
q4 2721 2706 2714 2706
q5 9712 9671 9686 9671
q6 251 123 124 123
q7 2601 2311 2238 2238
q8 4421 4438 4425 4425
q9 13196 13105 13188 13105
q10 4083 4191 4195 4191
q11 811 662 653 653
q12 985 805 807 805
q13 4309 3562 3568 3562
q14 390 361 366 361
q15 566 522 519 519
q16 758 663 658 658
q17 3849 3849 3787 3787
q18 9445 8991 8998 8991
q19 1831 1768 1773 1768
q20 2401 2049 2028 2028
q21 8943 8536 8594 8536
q22 880 825 810 810
Total cold run time: 81159 ms
Total hot run time: 77749 ms
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
TeamCity be ut coverage result: Function Coverage: 36.74% (8413/22898) Line Coverage: 29.26% (68434/233898) Region Coverage: 27.84% (35375/127060) Branch Coverage: 24.60% (18070/73468) Coverage Report: http://coverage.selectdb-in.cc/coverage/8e38c25e0d50b56a352a12d00caf3fe542e6492a_8e38c25e0d50b56a352a12d00caf3fe542e6492a/report/index.html
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 8e38c25e0d50b56a352a12d00caf3fe542e6492a, data reload: false
run tpch-sf100 query with default conf and session variables
q1 4971 4693 4679 4679
q2 364 152 158 152
q3 2036 1946 1883 1883
q4 1377 1249 1273 1249
q5 3983 3972 4028 3972
q6 243 134 136 134
q7 1379 853 879 853
q8 2778 2829 2797 2797
q9 9837 9793 9658 9658
q10 3447 3534 3535 3534
q11 371 250 251 250
q12 440 295 295 295
q13 4564 3799 3775 3775
q14 323 294 298 294
q15 586 529 524 524
q16 672 590 585 585
q17 1158 945 940 940
q18 7895 7425 7339 7339
q19 1681 1679 1674 1674
q20 522 311 298 298
q21 4435 3990 3965 3965
q22 483 382 375 375
Total cold run time: 53545 ms
Total hot run time: 49225 ms
run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1 4613 4585 4594 4585
q2 347 220 252 220
q3 4043 3994 4007 3994
q4 2714 2714 2693 2693
q5 9739 9594 9648 9594
q6 242 123 125 123
q7 2588 2216 2290 2216
q8 4465 4480 4489 4480
q9 13222 13132 13072 13072
q10 4090 4203 4226 4203
q11 788 639 681 639
q12 977 806 813 806
q13 4282 3553 3585 3553
q14 384 349 348 348
q15 569 520 533 520
q16 746 660 649 649
q17 3854 3858 3804 3804
q18 9658 9026 9139 9026
q19 1824 1800 1785 1785
q20 2401 2067 2067 2067
q21 8815 8419 8434 8419
q22 928 812 824 812
Total cold run time: 81289 ms
Total hot run time: 77608 ms
(From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.52 seconds stream load tsv: 576 seconds loaded 74807831229 Bytes, about 123 MB/s stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s storage size: 17099573475 Bytes
run buildall
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
TeamCity be ut coverage result: Function Coverage: 36.76% (8429/22927) Line Coverage: 29.28% (68593/234291) Region Coverage: 27.86% (35462/127264) Branch Coverage: 24.59% (18096/73580) Coverage Report: http://coverage.selectdb-in.cc/coverage/065080943cb0bc531821fc96d5718aa1f8ccfb04_065080943cb0bc531821fc96d5718aa1f8ccfb04/report/index.html
TeamCity be ut coverage result: Function Coverage: 36.78% (8432/22927) Line Coverage: 29.28% (68602/234291) Region Coverage: 27.86% (35459/127264) Branch Coverage: 24.59% (18094/73580) Coverage Report: http://coverage.selectdb-in.cc/coverage/6879db1185649e1f28f7933c39301ffd78cdffeb_6879db1185649e1f28f7933c39301ffd78cdffeb/report/index.html
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 6879db1185649e1f28f7933c39301ffd78cdffeb, data reload: false
run tpch-sf100 query with default conf and session variables
q1 4966 4698 4707 4698
q2 362 145 158 145
q3 2037 1899 1891 1891
q4 1381 1235 1230 1230
q5 3973 3866 3960 3866
q6 250 133 132 132
q7 1429 890 890 890
q8 2774 2782 2765 2765
q9 9523 19371 9494 9494
q10 3440 3492 3498 3492
q11 379 242 250 242
q12 450 291 297 291
q13 4553 3838 3776 3776
q14 309 294 285 285
q15 574 528 532 528
q16 664 586 582 582
q17 1137 982 921 921
q18 7810 7356 7445 7356
q19 1694 1693 1674 1674
q20 523 319 286 286
q21 4457 4000 4046 4000
q22 480 368 377 368
Total cold run time: 53165 ms
Total hot run time: 48912 ms
run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1 4569 4556 4620 4556
q2 345 224 263 224
q3 4014 4025 3990 3990
q4 2714 2711 2709 2709
q5 9629 9575 9652 9575
q6 245 121 124 121
q7 3036 2509 2491 2491
q8 4478 4442 4439 4439
q9 12865 12824 12753 12753
q10 4070 4182 4150 4150
q11 777 721 651 651
q12 982 815 824 815
q13 4307 3545 3523 3523
q14 373 370 356 356
q15 575 517 518 517
q16 731 664 659 659
q17 3893 3885 3838 3838
q18 9626 9053 9092 9053
q19 1828 1750 1765 1750
q20 2373 2054 2019 2019
q21 8713 8762 8614 8614
q22 874 759 746 746
Total cold run time: 81017 ms
Total hot run time: 77549 ms
(From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 46.12 seconds stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s storage size: 17101511571 Bytes
(From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.87 seconds stream load tsv: 580 seconds loaded 74807831229 Bytes, about 123 MB/s stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s insert into select: 28.0 seconds inserted 10000000 Rows, about 357K ops/s storage size: 17100227694 Bytes
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
TeamCity be ut coverage result: Function Coverage: 36.75% (8432/22946) Line Coverage: 29.25% (68587/234499) Region Coverage: 27.86% (35458/127294) Branch Coverage: 24.58% (18089/73578) Coverage Report: http://coverage.selectdb-in.cc/coverage/8e32fad60e763d2c0ae7f22c6fef0e314ea63382_8e32fad60e763d2c0ae7f22c6fef0e314ea63382/report/index.html
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 8e32fad60e763d2c0ae7f22c6fef0e314ea63382, data reload: false
run tpch-sf100 query with default conf and session variables
q1 4922 4651 4651 4651
q2 372 142 168 142
q3 2032 1879 1933 1879
q4 1380 1298 1275 1275
q5 3962 3951 4052 3951
q6 253 129 133 129
q7 1439 887 893 887
q8 2793 2789 2770 2770
q9 9725 9432 9607 9432
q10 3475 3495 3525 3495
q11 372 249 238 238
q12 441 292 292 292
q13 4580 3844 3778 3778
q14 326 298 292 292
q15 567 527 528 527
q16 672 597 586 586
q17 1133 969 961 961
q18 7899 7416 7386 7386
q19 1683 1680 1651 1651
q20 543 303 290 290
q21 4450 4018 4036 4018
q22 480 364 367 364
Total cold run time: 53499 ms
Total hot run time: 48994 ms
run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1 4576 4602 4543 4543
q2 337 231 233 231
q3 4038 4042 4017 4017
q4 2715 2714 2703 2703
q5 9578 9599 9562 9562
q6 247 130 125 125
q7 3027 2500 2502 2500
q8 4410 4421 4445 4421
q9 12935 12927 12899 12899
q10 4063 4159 4136 4136
q11 803 643 728 643
q12 970 831 816 816
q13 4309 3595 3528 3528
q14 384 345 346 345
q15 577 523 522 522
q16 732 671 669 669
q17 3960 3829 3859 3829
q18 9536 8990 9152 8990
q19 1787 1760 1785 1760
q20 2402 2059 2047 2047
q21 8953 8581 8726 8581
q22 875 837 774 774
Total cold run time: 81214 ms
Total hot run time: 77641 ms
(From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 44.76 seconds stream load tsv: 563 seconds loaded 74807831229 Bytes, about 126 MB/s stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s storage size: 17098936913 Bytes
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"