runtime icon indicating copy to clipboard operation
runtime copied to clipboard

TimeSpan overflowed because the duration is too long.

Open jkotas opened this issue 7 months ago • 9 comments

System.OverflowException: TimeSpan overflowed because the duration is too long.
   at System.TimeSpan.op_Subtraction(TimeSpan, TimeSpan) + 0x34
   at Program.<<Main>$>g__TestExecutor123|0_124(StreamWriter tempLogSw, StreamWriter statsCsvSw, Program.<>c__DisplayClass0_0&) + 0x138
Failed test: JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=1057538 Build error leg or test failing: JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll Pull request: https://github.com/dotnet/runtime/pull/115339

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "TimeSpan overflowed because the duration is too long.",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1057538 Error message validated: [TimeSpan overflowed because the duration is too long. Runtime_108612] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 6/3/2025 6:28:20 PM UTC

Report

Build Definition Test Pull Request
1057538 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll dotnet/runtime#115339

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 1 1

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1057538 Error message validated: [TimeSpan overflowed because the duration is too long.] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 6/3/2025 7:45:55 PM UTC

Report

Build Definition Test Pull Request
1057538 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll dotnet/runtime#115339

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 1 1

Report

Build Definition Test Pull Request
1067213 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll
1066743 dotnet/runtime JIT/Regression/VS-ia64-JIT/V1.2-M02/b26496/b26496/b26496.dll
1065977 dotnet/runtime JIT/Regression/VS-ia64-JIT/V1.2-M02/b26496/b26496/b26496.dll
1064725 dotnet/runtime JIT/Regression/VS-ia64-JIT/V1.2-M02/b26496/b26496/b26496.dll
1063412 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll
1062325 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll
1061900 dotnet/runtime JIT/Regression/VS-ia64-JIT/V1.2-M02/b26496/b26496/b26496.dll
1061766 dotnet/runtime JIT/Regression/VS-ia64-JIT/V1.2-M02/b26496/b26496/b26496.dll
1061713 dotnet/runtime JIT/Regression/VS-ia64-JIT/V1.2-M02/b26496/b26496/b26496.dll dotnet/runtime#116316
1060830 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll
1060395 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll dotnet/runtime#116324
1059304 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll
1058735 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll dotnet/runtime#108083
1058259 dotnet/runtime JIT/Regression/VS-ia64-JIT/V1.2-M02/b26496/b26496/b26496.dll
1057538 dotnet/runtime JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll dotnet/runtime#115339

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 9 15

jkotas avatar Jun 03 '25 18:06 jkotas

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Test introduced by https://github.com/dotnet/runtime/pull/108727.

cc @kunalspathak

jkotas avatar Jun 03 '25 18:06 jkotas

The test does not use TimeSpan at all. This looks like an exception thrown by the merged test wrapper boilerplate code, but I am not sure how that could happen.

jakobbotsch avatar Jun 03 '25 18:06 jakobbotsch

Agree and it doesn't look that the test was even running for longer time:

10:29:21.736 Running test: JIT/Regression/JitBlue/Runtime_108609/Runtime_108609/Runtime_108609.dll
10:29:21.736 Passed test: JIT/Regression/JitBlue/Runtime_108609/Runtime_108609/Runtime_108609.dll
10:29:21.736 Running test: JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll
System.OverflowException: TimeSpan overflowed because the duration is too long.
   at System.TimeSpan.op_Subtraction(TimeSpan, TimeSpan) + 0x34
   at Program.<<Main>$>g__TestExecutor123|0_124(StreamWriter tempLogSw, StreamWriter statsCsvSw, Program.<>c__DisplayClass0_0&) + 0x138
10:29:21.739 Failed test: JIT/Regression/JitBlue/Runtime_108612/Runtime_108612/Runtime_108612.dll
10:29:21.740 Running test: JIT/Regression/JitBlue/Runtime_108613/Runtime_108613/Runtime_108613.dll

kunalspathak avatar Jun 03 '25 18:06 kunalspathak

Tagging subscribers to this area: @hoyosjs See info in area-owners.md if you want to be subscribed.

From https://github.com/dotnet/runtime/pull/115339#issuecomment-2936900158: It can randomly fail for any test using merged test executor (not specific to this test).

jkotas avatar Jun 03 '25 19:06 jkotas

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

This appears to be native AOT specific.

jkotas avatar Jun 12 '25 02:06 jkotas

cc @jkoritzinsky

agocke avatar Jun 12 '25 04:06 agocke

stopwatch.Elapsed in the expression from https://github.com/dotnet/runtime/blob/b97f98b7bdd339ee3424e81ec0cfe4b40590268e/src/tests/Common/XUnitWrapperGenerator/ITestInfo.cs#L477 is TimeSpan with raw tick count value of Int64.MinValue (0x8000_0000_0000_0000). This corruption is intermittent.

Here is the offending code (with instrumentation from #116411):

...
0000b490`3fb67700 97fd7ebf bl          Regression_2!SystemNative_GetTimestamp (0000b490`3fac71fc)
0000b490`3fb67704 f94013b6 ldr         x22,[fp,#0x20]
0000b490`3fb67708 f9400ac1 ldr         x1,[x22,#0x10]
0000b490`3fb6770c cb010000 sub         x0,x0,x1
0000b490`3fb67710 8b170017 add         x23,x0,x23
0000b490`3fb67714 d0002580 adrp        x0,Regression_2!g_CrashInfoBuffer+0x1da8 (0000b490`40019000)
0000b490`3fb67718 9135c000 add         x0,x0,#0xD70
0000b490`3fb6771c b9400000 ldr         w0,[x0]
0000b490`3fb67720 350004e0 cbnz        w0,Regression_2!Regression_2_Program::<<Main>$>g__TestExecutor123|0_124+0x23c (0000b490`3fb677bc)
0000b490`3fb67724 f9400fb4 ldr         x20,[fp,#0x18]
0000b490`3fb67728 f9401bb5 ldr         x21,[fp,#0x30]
0000b490`3fb6772c 9e6202f0 scvtf       d16,x23
0000b490`3fb67730 1e680a10 fmul        d16,d16,d8
0000b490`3fb67734 9e780200 fcvtzs      x0,d16
0000b490`3fb67738 cb130016 sub         x22,x0,x19
0000b490`3fb6773c 937ffc01 asr         x1,x0,#0x3F
0000b490`3fb67740 937ffe63 asr         x3,x19,#0x3F
0000b490`3fb67744 937ffec4 asr         x4,x22,#0x3F
0000b490`3fb67748 eb03003f cmp         x1,x3
0000b490`3fb6774c fa441024 ccmpne      x1,x4,#4
0000b490`3fb67750 54000060 beq         Regression_2!Regression_2_Program::<<Main>$>g__TestExecutor123|0_124+0x1dc (0000b490`3fb6775c)
0000b490`3fb67754 aa1303e1 mov         x1,x19
0000b490`3fb67758 9404cdaa bl          Regression_2!S_P_CoreLib_System_TimeSpan::<op_Subtraction>g__ThrowOverflowException_TimeSpanTooLong|141_0 (0000b490`3fc9ae00)
0000b490`3fb6775c f9401fa2 ldr         x2,[fp,#0x38]
...

0000b490`3fb677bc 97ff7027 bl          Regression_2!RhpGcPoll (0000b490`3fb43858)
0000b490`3fb677c0 f9400fb4 ldr         x20,[fp,#0x18]
0000b490`3fb677c4 f9401bb5 ldr         x21,[fp,#0x30]
0000b490`3fb677c8 17ffffd9 b           Regression_2!Regression_2_Program::<<Main>$>g__TestExecutor123|0_124+0x1ac (0000b490`3fb6772c)

Given that the corruption is intermittent, the bug is likely caused by RhpGcPoll path clobbering some register.

jkotas avatar Jun 29 '25 23:06 jkotas

the bug is likely caused by RhpGcPoll path clobbering some register.

It would not explain why it is consistently failing on these two specific tests only. The exact same code is in many other tests.

These two test must be doing something specific that triggers the failure. The only way the test itself may cause this is by clobbering d8 that makes fmul d16,d16,d8 to produce bogus results. These two tests do in fact save/restore d8 during execution but I do not see anything wrong with how they do that.

jkotas avatar Jun 30 '25 00:06 jkotas

Observations: It seems to be failing on coreclr linux arm64 Checked @ (AzureLinux.3.0.ArmArch.Open)[email protected]/dotnet-buildtools/prereqs:azurelinux-3.0-helix-arm64v8 only. It does not fail on other linux arm64 variants.

jkotas avatar Jun 30 '25 21:06 jkotas

The article https://eclecticlight.co/2021/07/19/are-there-flaws-in-some-arm64-instructions/ discusses issues related to the fmul-fadd pattern. While this case doesn’t involve fadd, it’s possible there are microarchitectural quirks or edge cases affecting fmul on its own as well?

am11 avatar Jun 30 '25 21:06 am11

fmul-add behavior wrt. rounding errors is expected and well documented. I do not see

I expect this is going to be tracked down to a "boring" memory overwrite bug at the end. It is possible that this bug is in AzureLinux.3.0.

jkotas avatar Jun 30 '25 21:06 jkotas

These two tests do in fact save/restore d8 during execution but I do not see anything wrong with how they do that.

I have confirmed that the d8 value is corrupted. It is getting corrupted with an object reference allocated by the test, with high and low bits swapped.

Example of corrupted value: 0xd8cc54380000f727 . 0000f727d8cc5438 is a valid object reference to recently allocated object.

jkotas avatar Jul 04 '25 11:07 jkotas