databend icon indicating copy to clipboard operation
databend copied to clipboard

ci(test): add tpch stateless test of factor 0.1

Open edPanda opened this issue 3 years ago โ€ข 12 comments

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Add tpch stateless test of factor 0.1.

Results have no difference between databend and postsql.

Fixes #6689

edPanda avatar Jul 21 '22 17:07 edPanda

The latest updates on your projects. Learn more about Vercel for Git โ†—๏ธŽ

Name Status Preview Updated
databend โœ… Ready (Inspect) Visit Preview Aug 23, 2022 at 1:36AM (UTC)

vercel[bot] avatar Jul 21 '22 17:07 vercel[bot]

This pull request's title is not fulfill the requirements. @edPanda please update it ๐Ÿ™.

Valid format:

fix(query): fix group by string bug
  ^         ^---------------------^
  |         |
  |         +-> Summary in present tense.
  |
  +-------> Type: feat, fix, refactor, ci, build, docs, website, chore

Valid types:

  • feat: this PR introduces a new feature to the codebase
  • fix: this PR patches a bug in codebase
  • refactor: this PR changes the code base without new features or bugfix
  • ci|build: this PR changes build/testing/ci steps
  • docs|website: this PR changes the documents or websites
  • chore: this PR only has small changes that no need to record

mergify[bot] avatar Jul 21 '22 17:07 mergify[bot]

The data is too large to commit into git history. I suggest uploading them into a website like https://repo.databend.rs/.

sundy-li avatar Jul 22 '22 01:07 sundy-li

Is it possible to download the data from other places? This PR will add 869k lines.

Xuanwo avatar Jul 22 '22 01:07 Xuanwo

@edPanda Thanks for the contribution

  1. I have upload the datasets to http://repo.databend.rs/dataset/stateful/tpch.tar.gz, you can modify the script in 13_0000_prepare.sh using wget to download the data.
โžœ curl --HEAD http://repo.databend.rs/dataset/stateful/tpch.tar.gz
HTTP/1.1 200 OK
Content-Type: application/x-tar
Content-Length: 30371995
Connection: keep-alive
Date: Fri, 22 Jul 2022 03:35:03 GMT
Last-Modified: Fri, 22 Jul 2022 03:30:05 GMT
ETag: "f2fc54240b1162fcfa6dd586eb2ab129-4"
Accept-Ranges: bytes
Server: AmazonS3
X-Cache: Miss from cloudfront
Via: 1.1 79e5bd56174a0ac9fbc66556743812d6.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: HKG62-C2
X-Amz-Cf-Id: XXDSDDvylZN71BzeA9jGMyk6cbN95cyVJbYVZ7_5zNElWJMQliMHtg==

  1. You can migrate these the directory to be stateful tests.

Better to give it a force-push, this can remove the history caused by large tests data.

sundy-li avatar Jul 22 '22 03:07 sundy-li

@sundy-li Thanks a lot.

Every time I run 'make stateless-test', I find that the last few digits of the decimal point of some sql results are different. Can you help me find the reason? Like sql1 sql5 and so on.

edPanda avatar Jul 22 '22 15:07 edPanda

Having 9 errors! 258 tests passed.                     0 tests skipped.
The failure tests:
    /workspace/tests/suites/0_stateless/13_tpch/13_0001_q1.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0005_q5.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0009_q9.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0010_q10.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0014_q14.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0015_q15.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0017_q17.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0019_q19.sql
    /workspace/tests/suites/0_stateless/13_tpch/13_0022_q22.sql

This looks strange, like:

--- /workspace/tests/suites/0_stateless/13_tpch/13_0022_q22.result	2022-07-22 23:27:25.080615735 +0000
+++ /workspace/tests/suites/0_stateless/13_tpch/13_0022_q22.stdout	2022-07-22 23:33:36.822927253 +0000
@@ -2,6 +2,6 @@
 17	96	722560.1499999998
 18	99	738012.5199999999
 23	93	708285.2499999998
-29	85	632693.4599999998
-30	87	646748.02
-31	87	647372.4999999999
+29	85	632693.4599999997
+30	87	646748.0199999999
+31	87	647372.4999999998

cc @xudong963

BohuTANG avatar Jul 23 '22 01:07 BohuTANG

Seems related to https://github.com/datafuselabs/databend/issues/6213

xudong963 avatar Jul 23 '22 02:07 xudong963

Q16 result is too large, we can use order by + limit to output shorter result.

I find that the last few digits of the decimal point of some sql results are different.

We can wrap the query with truncate temporarily until we have decimal types.

sundy-li avatar Jul 23 '22 04:07 sundy-li

ok, thanks!

edPanda avatar Jul 23 '22 05:07 edPanda

Since it has been a long time since the last submission, I will re-check the results with pg tomorrow.

edPanda avatar Aug 08 '22 16:08 edPanda

I think we can also add tpch q6 (previously not added due to accuracy issues) @edPanda

xudong963 avatar Aug 09 '22 02:08 xudong963

We can add this data directory into .gitignore

sundy-li avatar Aug 18 '22 15:08 sundy-li

I found that the calculation results of sql5 in the cluster of the mac system are very different, and the standalone and cluster of linux do not have the wget command, which causes the test to fail. Here are results of sql5 in the cluster of the mac system: -CHINA 7822103.0 -INDIA 6376121.508 -JAPAN 6000077.218 -INDONESIA 5580475.402 -VIETNAM 4497840.546 +CHINA 2426891.126 +VIETNAM 1946778.709 +JAPAN 1702347.973 +INDONESIA 1579815.696 +INDIA 1169566.051

edPanda avatar Aug 21 '22 07:08 edPanda

I found that the calculation results of sql5 in the cluster of the mac system are very different,

It's optional ci, we will take deep look into cc @zhang2014

the standalone and cluster of linux do not have the wget command, which causes the test to fail.

@everpcpc Can u help this?

sundy-li avatar Aug 21 '22 12:08 sundy-li

I found that the calculation results of sql5 in the cluster of the mac system are very different,

It's optional ci, we will take deep look into cc @zhang2014

the standalone and cluster of linux do not have the wget command, which causes the test to fail.

@everpcpc Can u help this?

maybe we can use curl instead of wget

everpcpc avatar Aug 21 '22 12:08 everpcpc

I found that the calculation results of sql5 in the cluster of the mac system are very different, and the standalone and cluster of linux do not have the wget command, which causes the test to fail. Here are results of sql5 in the cluster of the mac system: -CHINA 7822103.0 -INDIA 6376121.508 -JAPAN 6000077.218 -INDONESIA 5580475.402 -VIETNAM 4497840.546 +CHINA 2426891.126 +VIETNAM 1946778.709 +JAPAN 1702347.973 +INDONESIA 1579815.696 +INDIA 1169566.051

Linux not work too:

13_0005_q5:                                                             [ FAIL ] - result differs with:
--- /workspace/tests/suites/0_stateless/13_tpch/13_0005_q5.result	2022-08-21 13:31:21.715785826 +0000
+++ /workspace/tests/suites/0_stateless/13_tpch/13_0005_q5.stdout	2022-08-21 13:34:04.362243534 +0000
@@ -1,5 +1,5 @@
-CHINA	7822103.0
-INDIA	6376121.508
-JAPAN	6000077.218
-INDONESIA	5580475.402
-VIETNAM	4497840.546
+CHINA	2426891.126
+VIETNAM	1946778.709
+JAPAN	170[234](https://github.com/datafuselabs/databend/runs/7939197859?check_suite_focus=true#step:4:245)7.973
+INDONESIA	1579815.696
+INDIA	1169566.051

https://github.com/datafuselabs/databend/runs/7939197859?check_suite_focus=true#step:4:241

cc @xudong963

BohuTANG avatar Aug 22 '22 01:08 BohuTANG

Can sql5 be skipped to let this pr end๏ผŸ

edPanda avatar Aug 22 '22 13:08 edPanda

Can sql5 be skipped to let this pr end๏ผŸ

Yes, you can uncomment the test.

sundy-li avatar Aug 22 '22 13:08 sundy-li

@mergify update

BohuTANG avatar Aug 22 '22 13:08 BohuTANG

update

โœ… Branch has been successfully updated

mergify[bot] avatar Aug 22 '22 13:08 mergify[bot]

How about making a squash on this PR so that we don't introduce not related changes like delete out.log?

Xuanwo avatar Aug 22 '22 16:08 Xuanwo