databend-perf icon indicating copy to clipboard operation
databend-perf copied to clipboard

[Tracking] Add hits dataset & queries

Open flaneur2020 opened this issue 2 years ago • 4 comments

links:

  • https://github.com/ClickHouse/ClickHouse/blob/484055984661a2ace7f6a62a1d9f9204ea7f04af/benchmark/compatible/databend/benchmark.sh
  • https://datasets.clickhouse.com/hits_compatible/hits.csv.gz
  • https://github.com/datafuselabs/databend/pull/6463/files

tasks:

  • [x] https://github.com/datafuselabs/databend-perf/pull/81
  • [x] add hits data into repo.databend.rs/hits/hits.tsv
  • [x] add partitoned hits data into repo.databend.rs/hits_p/*.tsv
  • [x] https://github.com/datafuselabs/databend/issues/6553
  • [x] https://github.com/datafuselabs/databend/issues/6554
  • [x] https://github.com/datafuselabs/databend-perf/pull/88
  • [ ] add queries about hits

flaneur2020 avatar Jul 05 '22 03:07 flaneur2020

2022/07/08 13:32:30 the 0 time result has error query has error: &{Code:1046 Message:Parse csv error at line 93557185, cause: CSV error: record 93557186 (line: 93557187, byte: 69916103044): found record with 100 fields, but the previous record has 105 fields (while in processor thread 3) Kind:}, stats:

https://github.com/datafuselabs/databend-perf/runs/7251163192?check_suite_focus=true

flaneur2020 avatar Jul 09 '22 05:07 flaneur2020

there may some bug on parsing the csv file, this script shows that the lines between 93557185~93557188 are all 105 fields:

╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557185p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"                                                 
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557184p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557186p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557187p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105
╭─yazhou@yazhous-MBP ~/Downloads
╰─$ sed -n '93557188p' < hits.tsv | ruby -e "puts STDIN.read.split(/\t/).size"
105

flaneur2020 avatar Jul 09 '22 05:07 flaneur2020

╰─$ sed -n '93557185,93557189p' < hits.tsv                                                                                                                                                                                                                                
8816084601099399273	1	на люкс - Формула 1 из мясо	1	2013-07-21 11:13:35	2013-07-21	122458	1389213378	15225	7900892430046930758	0	44	3	http://ufa/deti74.ru		0	0	0	419	216	1917	554	23	15	7	700	0	0	17	D�	1	1	0	0			1365155	0	0		0	0	778	815	135	2013-07-20 20:42:03	0	0	0	0	windows	1601	0	00	5632622125792883430		736845179	0	0	0	0	0	g	2013-07-20 21:40:01	0	0	0	0	0	1984237733	5053	-1	1	S0	h1			0	0	0	0	37	9	79	0		0		NH	0					ad_cpamarket	30533	rwtr_cl_bu_compaign		Other_cities		0	-296158784638538920	-8631670417943857411	0
4972262175479248633	1	на люкс - Формула 1 из мясо	1	2013-07-21 11:41:13	2013-07-21	122458	1389213378	15225	7900892430046930758	0	44	3	http://ufa/deti74.ru		0	0	0	419	216	1917	554	23	15	7	700	0	0	17	D�	1	1	0	0			1365155	0	0		0	0	778	815	135	2013-07-20 21:07:36	0	0	0	0	windows	1601	0	00	5632622125792883430		281031983	0	0	0	0	0	g	2013-07-20 22:00:41	0	0	0	0	0	1984237733	6821	-1	1	S0	h1			0	0	12	0	58	9	187	0		0		NH	0					ad_cpamarket	30533	rwtr_cl_bu_compaign		резюме екатеринбург		0	-296158784638538920	-8631670417943857411	0
7163949730692512472	1	на люкс - Формула 1 из мясо	1	2013-07-21 11:43:02	2013-07-21	122458	1389213378	15225	7900892430046930758	0	44	3	http://ufa/deti74.ru		0	0	0	419	216	1917	554	23	15	7	700	0	0	17	D�	1	1	0	0			1365155	0	0		0	0	778	815	135	2013-07-20 21:09:41	0	0	0	0	windows	1601	0	00	5632622125792883430		561426345	0	0	0	0	0	g	2013-07-20 22:02:15	0	0	0	0	0	1984237733	36786	-1	1	S0	h1			0	0	0	0	44	12	149	0		0		NH	0					ad_cpamarket	30533	rwtr_cl_bu_compaign		"tatuirovarki_redmond 70		0	-296158784638538920	-8631670417943857411	0
8049079189872617845	1	на люкс - Формула 1 из мясо	1	2013-07-21 11:46:20	2013-07-21	122458	1389213378	15225	7900892430046930758	0	44	3	http://ufa/deti74.ru		0	0	0	419	216	1917	554	23	15	7	700	0	0	17	D�	1	1	0	0			1365155	0	0		0	0	778	815	135	2013-07-20 21:13:39	0	0	0	0	windows	1601	0	00	5632622125792883430		806899868	0	0	0	0	0	g	2013-07-20 22:04:25	0	0	0	0	0	1984237733	45732	-1	1	S0	h1			0	0	0	0	46	14	187	0		0		NH	0					ad_cpamarket	30533	rwtr_cl_bu_compaign		investirovat_na_denginessman		0	-296158784638538920	-8631670417943857411	0
7329647432517701398	1	на люкс - Формула 1 из мясо	1	2013-07-21 12:04:55	2013-07-21	122458	1389213378	15225	7900892430046930758	0	44	3	http://ufa/deti74.ru		0	0	0	419	216	1917	554	23	15	7	700	0	0	17	D�	1	1	0	0			1365155	0	0		0	0	778	815	135	2013-07-20 21:32:59	0	0	0	0	windows	1601	0	00	5632622125792883430		692849168	0	0	0	0	0	g	2013-07-20 22:22:58	0	0	0	0	0	1984237733	64518	-1	1	S0	h1			0	0	0	0	59	9	250	0		0		NH	0					ad_cpamarket	30533	rwtr_cl_bu_compaign		1028cba65195		0	-296158784638538920	-8631670417943857411	0

flaneur2020 avatar Jul 09 '22 05:07 flaneur2020

get an error on load this subset: hits_bug.csv

COPY INTO hits FROM '@hits' files=('hits_bug.csv') file_format = (type = 'csv' field_delimiter = "\t" record_delimiter = "\n" skip_header = 0);
 1046: Cannot parse value:Ok("5632622125792883430") to number type, cause: Overflow(5) (while in processor thread 0)`

but databend-query version is a bit older(v0.7.115-nightly), i'd try with 0.7.121 again with this file

flaneur2020 avatar Jul 09 '22 05:07 flaneur2020