Baffling random failures in core Python behaviour in Windows CI
I'm seeing very frequent (once every 2-3 runs) random failures in CI, which happen
- exclusively on
windows-latestrunners, and - exclusively on Python 3.13 and 3.14 (not 3.14t).
- I cannot yet say conclusively if they only happen in
actions/setup-pythonor also insideactions/cibuildwheel.
The failures are as follows:
C:\hostedtoolcache\windows\Python\3.13.9\x64\Lib\site-packages\hypothesis\statistics.py:90: in describe_statistics
runtime_ms = format_ms(t["runtime"] for t in cases)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
times = <generator object describe_statistics.<locals>.<genexpr> at 0x00000200EA90CAD0>
def format_ms(times: Iterable[float]) -> str:
"""Format `times` into a string representing approximate milliseconds.
`times` is a collection of durations in seconds.
"""
ordered = sorted(times)
n = len(ordered) - 1
if n < 0 or any(math.isnan(t) for t in ordered): # pragma: no cover
return "NaN ms"
> lower = int(ordered[math.floor(n * 0.05)] * 1000)
^^^^^^^^^^^^^^^^^^^^
E OverflowError: cannot convert float infinity to integer
C:\hostedtoolcache\windows\Python\3.13.9\x64\Lib\site-packages\hypothesis\statistics.py:58: OverflowError
============================ slowest 10 durations =============================
1.06s call tests/test_gemm.py::test_threads_share_input
0.59s call tests/test_dotv.py::test_threads_share_input
0.42s call tests/test_dotv.py::test_memoryview_noconj
0.25s call tests/test_gemm.py::test_memoryview_notrans
times is a collection of runtimes for a test and I expect it to be fairly short - anywhere between 0 and a few thousands elements.
Note the underline, which is saying that n * 0.05 is valued inf.
Either:
list.__len__returned float inf instead of the expected (fairly small) integer, or- a small integer * 0.05 returned inf.
Both of the above are absurd and I have no explanation.
@liam-devoe have you ever seen anything like this? Somehow in hypothesis internals, pure-python expressions aren't evaluating to the correct result.
Example: https://github.com/explosion/cython-blis/actions/runs/20043099915
This is completely cursed and I have have no explanation beyond the two hypotheses you've posed 😄. I've never seen something like this before