velox icon indicating copy to clipboard operation
velox copied to clipboard

Increase peer rows in partitions generated by window fuzzer

Open pramodsatya opened this issue 1 year ago • 1 comments

Increases the peer groups in each partition of the data generated by the window fuzzer (resolves https://github.com/facebookincubator/velox/issues/10184). A random integer between 1 and the number of rows in input data, say n, representing the total number of peer groups across all partitions is first generated. Then, the total number of partitions of input data, say m, is generated such that 1 < m < n. The indices of a dictionary vector mapping the peer group to partition for each row is then generated (say peerGroupToPartitionIndices). The partition index for each row can then be constructed from peerGroupToPartitionIndices.

pramodsatya avatar Jun 24 '24 03:06 pramodsatya

Deploy Preview for meta-velox canceled.

Name Link
Latest commit a4a1bf158a332dc69bc81fea31fce4da7202ebbc
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/677da7ea693f1500082b1223

netlify[bot] avatar Jun 24 '24 03:06 netlify[bot]

@kagamiori : Please can you help review.

aditi-pandit avatar Nov 06 '24 17:11 aditi-pandit

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Nov 15 '24 03:11 facebook-github-bot

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Nov 15 '24 16:11 facebook-github-bot

Hi @pramodsatya, the window fuzzer fails in CI: https://github.com/facebookincubator/velox/actions/runs/11850033972/job/33024861069?pr=10293. Could you take a look? Maybe run window fuzzer with Presto for one hour to ensure there is no failure and no timeout?

kagamiori avatar Nov 15 '24 23:11 kagamiori

Hi @pramodsatya, the window fuzzer fails in CI: https://github.com/facebookincubator/velox/actions/runs/11850033972/job/33024861069?pr=10293. Could you take a look? Maybe run window fuzzer with Presto for one hour to ensure there is no failure and no timeout?

Thanks @kagamiori, this error seems to be appearing after the rebase. I reran the window fuzzer with Presto as reference query runner and I see result mismatches now. I will investigate further and update the PR.

pramodsatya avatar Nov 20 '24 01:11 pramodsatya

Hi @pramodsatya, have you find out what's wrong with the result mismatch in window fuzzer?

kagamiori avatar Jan 03 '25 22:01 kagamiori

Hi @pramodsatya, have you find out what's wrong with the result mismatch in window fuzzer?

Hi @kagamiori, I am unable to replicate the same error seen in the CI job locally with the window fuzzer (it looks like the seed information is not logged in the Window fuzzer Github action?). When I run the window fuzzer locally with Presto Query Runner for verification, I see the results do not match for columns of type TimestampWithTimeZone, I am looking into this mismatch and will provide an update soon.

pramodsatya avatar Jan 03 '25 22:01 pramodsatya

Hi @pramodsatya, have you find out what's wrong with the result mismatch in window fuzzer?

Hi @kagamiori, I am unable to replicate the same error seen in the CI job locally with the window fuzzer (it looks like the seed information is not logged in the Window fuzzer Github action?). When I run the window fuzzer locally with Presto Query Runner for verification, I see the results do not match for columns of type TimestampWithTimeZone, I am looking into this mismatch and will provide an update soon.

@pramodsatya Make sure you rebase onto the latest main before investigating because I made some changes in window fuzzer recently. For fuzzer failures in CI, you can find the seed from the saved artifact (e.g., https://github.com/facebookincubator/velox/actions/runs/11850033972/). There should be a log in the downloaded zip file and you can find the last seed in that log (i.e., the seed right before the failure).

kagamiori avatar Jan 03 '25 22:01 kagamiori

Hi @kagamiori, the window fuzzer error is not seen after the rebase, both on local run and the CI. The Spark Fuzzer CI job is currently failing but the error appears unrelated to these changes and the same error is seen on other PRs as well. Could you please take another look and help merge the changes?

Results from local run of window fuzzer for 1hr:

I20250107 16:18:37.296044 11850100 TempDirectoryPath.cpp:29] TempDirectoryPath:: removing all files from /tmp/velox_test_ebf6Fh
I20250107 16:18:37.299245 11850100 WindowFuzzer.cpp:557] ==============================> Done with iteration 1321
I20250107 16:18:37.301649 11850100 AggregationFuzzerBase.cpp:675] Total functions tested: 66
I20250107 16:18:37.301664 11850100 AggregationFuzzerBase.cpp:676] Total iterations requiring sorted inputs: 1032 (78.06%)
I20250107 16:18:37.301677 11850100 AggregationFuzzerBase.cpp:678] Total iterations verified against reference DB: 35 (2.65%)
I20250107 16:18:37.301684 11850100 AggregationFuzzerBase.cpp:680] Total functions not verified (verification skipped / not supported by reference DB / reference DB failed): 655 (49.55%) / 27 (2.04%) / 512 (38.73%)
I20250107 16:18:37.301690 11850100 AggregationFuzzerBase.cpp:685] Total failed functions: 146 (11.04%)
I20250107 16:18:37.301697 11850100 WindowFuzzer.cpp:817] Total functions verified in reference DB: 22
[==========] Running 0 tests from 0 test suites.
[==========] 0 tests from 0 test suites ran. (0 ms total)
[  PASSED  ] 0 tests.

pramodsatya avatar Jan 07 '25 23:01 pramodsatya

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jan 09 '25 20:01 facebook-github-bot

@kagamiori merged this pull request in facebookincubator/velox@2b74a93bcb80dacee9a7a78a5e5722399fa3bcab.

facebook-github-bot avatar Jan 14 '25 01:01 facebook-github-bot