[SPARK-48751][INFRA][PYTHON][TESTS] Re-balance `pyspark-pandas-connect` tests on GA
What changes were proposed in this pull request?
The pr aims to re-balance pyspark-pandas-connect tests on GA.
Why are the changes needed?
Make the execution cost time of pyspark-pandas-connect-part[0-3] testing to a relatively average level, avoiding the occurrence of long tails and resulting in higher overall GA execution cost time.
Here are some currently observed examples:
-
https://github.com/apache/spark/pull/47135/checks?check_run_id=26784966983
Most of them are around
1 hour, butpart2cost1h 49m,part3cost2h 16m -
https://github.com/panbingkun/spark/actions/runs/9693237300
Most of them are around
1 hour, butpart2cost1h 47m,part3cost2h 20m
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manually observing the cost time of pyspark-pandas-connect-part[0-3].
Was this patch authored or co-authored using generative AI tooling?
No.
Use the following steps to re-balance:
- Download logs from GA, extract the execution cost time of
each UT, and calculate the total execution cost time ofeach part*, eg:pyspark-pandas-connect-part0
2024-06-28T05:27:37.6183255Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_any_all (36s)
2024-06-28T05:29:27.2891428Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_apply_func (109s)
2024-06-28T05:29:51.3194664Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_binary_ops (24s)
2024-06-28T05:32:15.4889334Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_combine (144s)
2024-06-28T05:33:37.6599457Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_compute (82s)
2024-06-28T05:36:26.0724168Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_corr (168s)
2024-06-28T05:39:31.0848420Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_corrwith (185s)
2024-06-28T05:40:06.1762415Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_cov (35s)
2024-06-28T05:40:54.8319822Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_cumulative (48s)
2024-06-28T05:41:36.4479258Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_describe (41s)
2024-06-28T05:41:51.9689250Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_eval (15s)
2024-06-28T05:42:37.0018704Z Finished test(python3.11): pyspark.pandas.tests.connect.computation.test_parity_melt (45s)
...
- Through statistics and analysis
| UT name | Total Cost | Diff |
|---|---|---|
| pyspark-pandas-connect-part0 | 4075 s | 4075 - 5187.5 = - 1112.5 s |
| pyspark-pandas-connect-part1 | 4087 s | 4087 - 5187.5 = - 1100.5 s |
| pyspark-pandas-connect-part2 | 5371 s | 5371 - 5187.5 = + 183.5 s |
| pyspark-pandas-connect-part3 | 7217 | 7217 - 5187.5 = + 2029.5 s |
| Avg Cost | 5187.5 s |
- By the above
Diff, move the possibleUT componentsfrom the high cost timepart*to the low cost timepart*to achieve the final balance.
After this pr:
-
First https://github.com/panbingkun/spark/actions/runs/9718805972
part0, cost time: 1h 44m part1, cost time: 1h 40m part2, cost time: 1h 45m part3, cost time: 1h 44m
-
Second https://github.com/panbingkun/spark/actions/runs/9721535055
part0, cost time: 1h 45m part1, cost time: 1h 43m part2, cost time: 1h 49m part3, cost time: 1h 45m
cc @zhengruifeng and @itholic
Looks fine for now, but maybe in the future we might need to separate this into more parts instead of just rebalancing if the number of test will be increased.
Nope, actually splitting the build increases the usage of the resource so I asked to distribute existing test cases for now. We got a bit of pushes from ASF
Merged to master.
Late LGTM
Nope, actually splitting the build increases the usage of the resource so I asked to distribute existing test cases for now. We got a bit of pushes from ASF
I'm actually quite curious, what does this mean? - We got a bit of pushes from ASF
Does ASF require us to reduce the usage of resource?
we now have limited resources. See also https://issues.apache.org/jira/browse/SPARK-48094
we now have limited resources. See also https://issues.apache.org/jira/browse/SPARK-48094
Okay, I see, thanks.