databend-perf
databend-perf copied to clipboard
[Tracking] Add hits dataset & queries
links:
- https://github.com/ClickHouse/ClickHouse/blob/484055984661a2ace7f6a62a1d9f9204ea7f04af/benchmark/compatible/databend/benchmark.sh
- https://datasets.clickhouse.com/hits_compatible/hits.csv.gz
- https://github.com/datafuselabs/databend/pull/6463/files
tasks:
- [x] https://github.com/datafuselabs/databend-perf/pull/81
- [x] add hits data into repo.databend.rs/hits/hits.tsv
- [x] add partitoned hits data into repo.databend.rs/hits_p/*.tsv
- [x] https://github.com/datafuselabs/databend/issues/6553
- [x] https://github.com/datafuselabs/databend/issues/6554
- [x] https://github.com/datafuselabs/databend-perf/pull/88
- [ ] add queries about hits
2022/07/08 13:32:30 the 0 time result has error query has error: &{Code:1046 Message:Parse csv error at line 93557185, cause: CSV error: record 93557186 (line: 93557187, byte: 69916103044): found record with 100 fields, but the previous record has 105 fields (while in processor thread 3) Kind:}, stats:
https://github.com/datafuselabs/databend-perf/runs/7251163192?check_suite_focus=true
there may some bug on parsing the csv file, this script shows that the lines between 93557185~93557188 are all 105 fields:
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557185p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557184p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557186p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557187p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557188p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╰─$ sed -n '93557185,93557189p' < hits.tsv
8816084601099399273 1 на люкс - Формула 1 из мясо 1 2013-07-21 11:13:35 2013-07-21 122458 1389213378 15225 7900892430046930758 0 44 3 http://ufa/deti74.ru 0 0 0 419 216 1917 554 23 15 7 700 0 0 17 D� 1 1 0 0 1365155 0 0 0 0 778 815 135 2013-07-20 20:42:03 0 0 0 0 windows 1601 0 00 5632622125792883430 736845179 0 0 0 0 0 g 2013-07-20 21:40:01 0 0 0 0 0 1984237733 5053 -1 1 S0 h1 0 0 0 0 37 9 79 0 0 NH 0 ad_cpamarket 30533 rwtr_cl_bu_compaign Other_cities 0 -296158784638538920 -8631670417943857411 0
4972262175479248633 1 на люкс - Формула 1 из мясо 1 2013-07-21 11:41:13 2013-07-21 122458 1389213378 15225 7900892430046930758 0 44 3 http://ufa/deti74.ru 0 0 0 419 216 1917 554 23 15 7 700 0 0 17 D� 1 1 0 0 1365155 0 0 0 0 778 815 135 2013-07-20 21:07:36 0 0 0 0 windows 1601 0 00 5632622125792883430 281031983 0 0 0 0 0 g 2013-07-20 22:00:41 0 0 0 0 0 1984237733 6821 -1 1 S0 h1 0 0 12 0 58 9 187 0 0 NH 0 ad_cpamarket 30533 rwtr_cl_bu_compaign резюме екатеринбург 0 -296158784638538920 -8631670417943857411 0
7163949730692512472 1 на люкс - Формула 1 из мясо 1 2013-07-21 11:43:02 2013-07-21 122458 1389213378 15225 7900892430046930758 0 44 3 http://ufa/deti74.ru 0 0 0 419 216 1917 554 23 15 7 700 0 0 17 D� 1 1 0 0 1365155 0 0 0 0 778 815 135 2013-07-20 21:09:41 0 0 0 0 windows 1601 0 00 5632622125792883430 561426345 0 0 0 0 0 g 2013-07-20 22:02:15 0 0 0 0 0 1984237733 36786 -1 1 S0 h1 0 0 0 0 44 12 149 0 0 NH 0 ad_cpamarket 30533 rwtr_cl_bu_compaign "tatuirovarki_redmond 70 0 -296158784638538920 -8631670417943857411 0
8049079189872617845 1 на люкс - Формула 1 из мясо 1 2013-07-21 11:46:20 2013-07-21 122458 1389213378 15225 7900892430046930758 0 44 3 http://ufa/deti74.ru 0 0 0 419 216 1917 554 23 15 7 700 0 0 17 D� 1 1 0 0 1365155 0 0 0 0 778 815 135 2013-07-20 21:13:39 0 0 0 0 windows 1601 0 00 5632622125792883430 806899868 0 0 0 0 0 g 2013-07-20 22:04:25 0 0 0 0 0 1984237733 45732 -1 1 S0 h1 0 0 0 0 46 14 187 0 0 NH 0 ad_cpamarket 30533 rwtr_cl_bu_compaign investirovat_na_denginessman 0 -296158784638538920 -8631670417943857411 0
7329647432517701398 1 на люкс - Формула 1 из мясо 1 2013-07-21 12:04:55 2013-07-21 122458 1389213378 15225 7900892430046930758 0 44 3 http://ufa/deti74.ru 0 0 0 419 216 1917 554 23 15 7 700 0 0 17 D� 1 1 0 0 1365155 0 0 0 0 778 815 135 2013-07-20 21:32:59 0 0 0 0 windows 1601 0 00 5632622125792883430 692849168 0 0 0 0 0 g 2013-07-20 22:22:58 0 0 0 0 0 1984237733 64518 -1 1 S0 h1 0 0 0 0 59 9 250 0 0 NH 0 ad_cpamarket 30533 rwtr_cl_bu_compaign 1028cba65195 0 -296158784638538920 -8631670417943857411 0
get an error on load this subset: hits_bug.csv
COPY INTO hits FROM '@hits' files=('hits_bug.csv') file_format = (type = 'csv' field_delimiter = "\t" record_delimiter = "\n" skip_header = 0);
1046: Cannot parse value:Ok("5632622125792883430") to number type, cause: Overflow(5) (while in processor thread 0)`
but databend-query version is a bit older(v0.7.115-nightly), i'd try with 0.7.121 again with this file