Open-Source-Orchard-Core-Extensions
Open-Source-Orchard-Core-Extensions copied to clipboard
Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834)
After https://github.com/Lombiq/UI-Testing-Toolbox/issues/228, remove the build-create-binary-log
and dotnet-test-process-timeout
configurations from OSOCE workflows.
As per https://github.com/Lombiq/UI-Testing-Toolbox/issues/228#issuecomment-2033024965:
Lets keep the current configuration for a whilw (1-2 months).
Then we can also remove it from Lombiq Tenants Core.
I again see random, but frequent Windows hangs. E.g. the two runs here: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/8976025812?pr=746. These hang here, only telling me that a PS command is hanging, which is the dotnet test
one:
Then this same code worked here and here, while getting stuck here and here (I disabled hand timeouts deliberately for these two to see what happens).
Do you perhaps have any suggestions?
Twice in a line, it was really rare. I can say.
As I see in the dump, the AsyncStreamReader.ReadBufferAsync
is waiting to read something from stdout
or stderr
of the dotnet
process. The dotnet process was not dumped so possibly it was finished before the timeout expired.
The dotnet
process was exited or the ::warning::The process $($process.Id) didn't exit in $Timeout seconds.
was not logged for an unknown reason.
I don't know if it is possible, but maybe this can be caused by a special character or control character coming out earlier on the given stream.
I'll look into the logs and dumps if it happens again. Please report similar failures under this issue.
Thank you for looking into this! How do you get to see what you show in your first screenshot? This is all I see when opening the dump (Debug with Mixed):
I've done three test runs, with hang timeouts: the first one passed, the other two hang, see here and here. The hang dumb has the same.
These are for some new code, see https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/pull/746, that might cause new issues too (I don't see why). And there's no output from the hanging test run.
But! If I skip TimeoutTests
, then I get 4 passes in a row (two runs here, then here and here). @sarahelsaig does this ring a bell? Something in the timeout config seems to get tests to stuck.
You can turn on the Tasks
window by selecting Debug/Windows/Tasks
item. See:
Once you have the Tasks
window double click on the line selected below:
Also Threads
, Watch N
, and Call Stack
windows come in handy in similar situations.
I'll look at the logs latest tomorrow.
Ah, thanks!
Looking at these, there's a bunch of Applications Insights references, what seems quite strange (however, no stack trace beyond a depth of 1, so I don't see where those are actually coming from). In any case, I removed the Application Insights module of ours being enabled during tests. It resulted in two passes (here and here, and two hangs (here and here). These still include AI references in the same way. So, it's not our AI, but something by GitHub.
I also did some test runs with reruns of what we have in dev
(here and here), and they passed (though the second one failed with an "ENOENT: no such file or directory..." error, that's a problem with the artifact upload).
BTW it's always the Lombiq.Tests.UI.Samples
that hangs. And there's no output, though I thought we'd have that now with streaming test outputs. Locally I can see the same behavior when running Invoke-SolutionTests
: no output from a test project until the next one starts.
Having the test run's output show up in the console output would help immensely to pinpoint where this goes wrong.
That may be an xUnit/dotnet test issue; I've opened https://github.com/xunit/xunit/discussions/2925.
That's something xUnit needs to add: https://github.com/xunit/visualstudio.xunit/issues/408.
I added a debug log file that gets updated immediately on every ITestOutputHelper
write in the FailuredDumps, and observed a hang here: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9033609153. That log and the workflow output both show that the tests in Lombiq.OSOCE.Tests.UI
finished (49 completed, as it should be), but still we get test failed:
2024-05-10T14:42:10.5686978Z Results File: C:\a\Open-Source-Orchard-Core-Extensions\Open-Source-Orchard-Core-Extensions\test\Lombiq.OSOCE.Tests.UI\TestResults\test-results.trx
2024-05-10T14:42:10.5687128Z
2024-05-10T14:42:10.5688680Z Test Run Successful.
2024-05-10T14:42:10.5689829Z Passed: 49
2024-05-10T14:42:10.5691093Z Total tests: 49
2024-05-10T14:42:10.5692141Z Total time: 3.5414 Minutes
2024-05-10T14:42:10.5713126Z Test failed: test\Lombiq.OSOCE.Tests.UI\Lombiq.OSOCE.Tests.UI.csproj
2024-05-10T14:42:10.6026845Z ##[error]Process completed with exit code 1.
2024-05-10T14:42:10.6151953Z ##[group]Run Merge-FailureDumps -Directory "."
2024-05-10T14:42:10.6152088Z [36;1mMerge-FailureDumps -Directory "."[0m
2024-05-10T14:42:10.6183772Z shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'"
2024-05-10T14:42:10.6183861Z env:
2024-05-10T14:42:10.6184153Z CHECKOUT_TOKEN: ***
2024-05-10T14:42:10.6184278Z DOTNET_SKIP_FIRST_TIME_EXPERIENCE: true
2024-05-10T14:42:10.6184386Z DOTNET_CLI_TELEMETRY_OPTOUT: true
2024-05-10T14:42:10.6184494Z DOTNET_ROOT: C:\Program Files\dotnet
2024-05-10T14:42:10.6184592Z DOTNET_SDK_TO_USE: 8.0.204
2024-05-10T14:42:10.6184869Z Lombiq_Tests_UI__OrchardCoreUITestExecutorConfiguration__MaxParallelTests: 0
2024-05-10T14:42:10.6184946Z ##[endgroup]
2024-05-10T14:42:10.8788979Z
2024-05-10T14:42:10.8795785Z Directory: C:\a\Open-Source-Orchard-Core-Extensions\Open-Source-Orchard-Core-Extensions
2024-05-10T14:42:10.8796453Z
2024-05-10T14:42:10.8797334Z [32;1mMode [0m[32;1m LastWriteTime[0m [32;1;3m Length[0m[32;1m Name[0m
2024-05-10T14:42:10.8797934Z [32;1m---- [0m [32;1m -------------[0m [32;1m ------[0m [32;1m----[0m
2024-05-10T14:42:10.8944776Z d---- 5/10/2024 2:42 PM [44;1mFailureDumps[0m
This is with dotnet-test-process-timeout: 600000
and blame-hang-timeout
being empty. It looks like some kind of premature task kill to me. Also, Write-Output
in StartProcessAndWaitForExit
won't work, since in PS output isn't surfaced from functions for some reason. I've fixed that with | Out-Host
. Then something strange happend, as you can see under this build:
-
Lombiq.OSOCE.Tests.UI
ran to completion with "Test Run Successful.",Lombiq.Tests.UI.Samples
didn't start. - But still, "Test failed: test\Lombiq.OSOCE.Tests.UI\Lombiq.OSOCE.Tests.UI.csproj ##[error]Process completed with exit code 1."
- So, it seems that this is not an actual hang but somehow the
dotnest test
process didn't exit, or didn't exit with the exit code 0, or somehow this line stopped working.
Worth noting that previously, I've seen different behaviors with actual hangs, where a test project was started but didn't finish. Not this time though.
It was a PS syntax issue: https://github.com/Lombiq/GitHub-Actions/pull/346/files#diff-c9181b664fb498c4bf7694492ba4c67f5c9fe31a8b86be446ca2a80b0fdde3c6L206 After that and adjusting the timeouts, it seems OK, e.g.: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9055050994/attempts/1. And randomly the whole build can be horribly slow, but still succeed despite the test process being killed: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9055173471/attempts/1. So, this looks good now!
I think why hangs became more frequent, also generally, is because we switched from GitRunners to GitHub Windows runners after the former shut down. And the performance of these varies wildly, with even the build taking between 5 and 10 minutes. So, if we by chance get a worse runner, then tests will be slower too.
I think why hangs became more frequent, also generally, is because we switched from GitRunners to GitHub Windows runners after the former shut down. And the performance of these varies wildly, with even the build taking between 5 and 10 minutes. So, if we by chance get a worse runner, then tests will be slower too.
OK then!