Open-Source-Orchard-Core-Extensions Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834)

After https://github.com/Lombiq/UI-Testing-Toolbox/issues/228, remove the build-create-binary-log and dotnet-test-process-timeout configurations from OSOCE workflows.

Jira issue

Apr 02 '24 20:04 dministro

As per https://github.com/Lombiq/UI-Testing-Toolbox/issues/228#issuecomment-2033024965:

Lets keep the current configuration for a whilw (1-2 months).

Then we can also remove it from Lombiq Tenants Core.

Apr 03 '24 00:04 Piedone

I again see random, but frequent Windows hangs. E.g. the two runs here: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/8976025812?pr=746. These hang here, only telling me that a PS command is hanging, which is the dotnet test one:

Then this same code worked here and here, while getting stuck here and here (I disabled hand timeouts deliberately for these two to see what happens).

Do you perhaps have any suggestions?

May 07 '24 00:05 Piedone

Twice in a line, it was really rare. I can say.

As I see in the dump, the AsyncStreamReader.ReadBufferAsync is waiting to read something from stdout or stderr of the dotnet process. The dotnet process was not dumped so possibly it was finished before the timeout expired.

The dotnet process was exited or the ::warning::The process $($process.Id) didn't exit in $Timeout seconds. was not logged for an unknown reason.

I don't know if it is possible, but maybe this can be caused by a special character or control character coming out earlier on the given stream.

I'll look into the logs and dumps if it happens again. Please report similar failures under this issue.

May 07 '24 18:05 dministro

Thank you for looking into this! How do you get to see what you show in your first screenshot? This is all I see when opening the dump (Debug with Mixed):

I've done three test runs, with hang timeouts: the first one passed, the other two hang, see here and here. The hang dumb has the same.

These are for some new code, see https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/pull/746, that might cause new issues too (I don't see why). And there's no output from the hanging test run.

But! If I skip TimeoutTests, then I get 4 passes in a row (two runs here, then here and here). @sarahelsaig does this ring a bell? Something in the timeout config seems to get tests to stuck.

May 07 '24 20:05 Piedone

You can turn on the Tasks window by selecting Debug/Windows/Tasks item. See:

Once you have the Tasks window double click on the line selected below:

Also Threads, Watch N, and Call Stack windows come in handy in similar situations.

I'll look at the logs latest tomorrow.

May 07 '24 21:05 dministro

Ah, thanks!

Looking at these, there's a bunch of Applications Insights references, what seems quite strange (however, no stack trace beyond a depth of 1, so I don't see where those are actually coming from). In any case, I removed the Application Insights module of ours being enabled during tests. It resulted in two passes (here and here, and two hangs (here and here). These still include AI references in the same way. So, it's not our AI, but something by GitHub.

I also did some test runs with reruns of what we have in dev (here and here), and they passed (though the second one failed with an "ENOENT: no such file or directory..." error, that's a problem with the artifact upload).

May 07 '24 22:05 Piedone

BTW it's always the Lombiq.Tests.UI.Samples that hangs. And there's no output, though I thought we'd have that now with streaming test outputs. Locally I can see the same behavior when running Invoke-SolutionTests: no output from a test project until the next one starts.

Having the test run's output show up in the console output would help immensely to pinpoint where this goes wrong.

May 09 '24 20:05 Piedone

That may be an xUnit/dotnet test issue; I've opened https://github.com/xunit/xunit/discussions/2925.

May 09 '24 23:05 Piedone

That's something xUnit needs to add: https://github.com/xunit/visualstudio.xunit/issues/408.

May 10 '24 12:05 Piedone

I added a debug log file that gets updated immediately on every ITestOutputHelper write in the FailuredDumps, and observed a hang here: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9033609153. That log and the workflow output both show that the tests in Lombiq.OSOCE.Tests.UI finished (49 completed, as it should be), but still we get test failed:

2024-05-10T14:42:10.5686978Z Results File: C:\a\Open-Source-Orchard-Core-Extensions\Open-Source-Orchard-Core-Extensions\test\Lombiq.OSOCE.Tests.UI\TestResults\test-results.trx
2024-05-10T14:42:10.5687128Z 
2024-05-10T14:42:10.5688680Z Test Run Successful.
2024-05-10T14:42:10.5689829Z      Passed: 49
2024-05-10T14:42:10.5691093Z Total tests: 49
2024-05-10T14:42:10.5692141Z  Total time: 3.5414 Minutes
2024-05-10T14:42:10.5713126Z Test failed: test\Lombiq.OSOCE.Tests.UI\Lombiq.OSOCE.Tests.UI.csproj
2024-05-10T14:42:10.6026845Z ##[error]Process completed with exit code 1.
2024-05-10T14:42:10.6151953Z ##[group]Run Merge-FailureDumps -Directory "."
2024-05-10T14:42:10.6152088Z [36;1mMerge-FailureDumps -Directory "."[0m
2024-05-10T14:42:10.6183772Z shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'"
2024-05-10T14:42:10.6183861Z env:
2024-05-10T14:42:10.6184153Z   CHECKOUT_TOKEN: ***
2024-05-10T14:42:10.6184278Z   DOTNET_SKIP_FIRST_TIME_EXPERIENCE: true
2024-05-10T14:42:10.6184386Z   DOTNET_CLI_TELEMETRY_OPTOUT: true
2024-05-10T14:42:10.6184494Z   DOTNET_ROOT: C:\Program Files\dotnet
2024-05-10T14:42:10.6184592Z   DOTNET_SDK_TO_USE: 8.0.204
2024-05-10T14:42:10.6184869Z   Lombiq_Tests_UI__OrchardCoreUITestExecutorConfiguration__MaxParallelTests: 0
2024-05-10T14:42:10.6184946Z ##[endgroup]
2024-05-10T14:42:10.8788979Z 
2024-05-10T14:42:10.8795785Z     Directory: C:\a\Open-Source-Orchard-Core-Extensions\Open-Source-Orchard-Core-Extensions
2024-05-10T14:42:10.8796453Z 
2024-05-10T14:42:10.8797334Z [32;1mMode   [0m[32;1m              LastWriteTime[0m [32;1;3m        Length[0m[32;1m Name[0m
2024-05-10T14:42:10.8797934Z [32;1m----   [0m [32;1m             -------------[0m [32;1m        ------[0m [32;1m----[0m
2024-05-10T14:42:10.8944776Z d----           5/10/2024  2:42 PM                [44;1mFailureDumps[0m

This is with dotnet-test-process-timeout: 600000 and blame-hang-timeout being empty. It looks like some kind of premature task kill to me. Also, Write-Output in StartProcessAndWaitForExit won't work, since in PS output isn't surfaced from functions for some reason. I've fixed that with | Out-Host. Then something strange happend, as you can see under this build:

Lombiq.OSOCE.Tests.UI ran to completion with "Test Run Successful.", Lombiq.Tests.UI.Samples didn't start.
But still, "Test failed: test\Lombiq.OSOCE.Tests.UI\Lombiq.OSOCE.Tests.UI.csproj ##[error]Process completed with exit code 1."
So, it seems that this is not an actual hang but somehow the dotnest test process didn't exit, or didn't exit with the exit code 0, or somehow this line stopped working.

Worth noting that previously, I've seen different behaviors with actual hangs, where a test project was started but didn't finish. Not this time though.

May 10 '24 17:05 Piedone

It was a PS syntax issue: https://github.com/Lombiq/GitHub-Actions/pull/346/files#diff-c9181b664fb498c4bf7694492ba4c67f5c9fe31a8b86be446ca2a80b0fdde3c6L206 After that and adjusting the timeouts, it seems OK, e.g.: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9055050994/attempts/1. And randomly the whole build can be horribly slow, but still succeed despite the test process being killed: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9055173471/attempts/1. So, this looks good now!

May 13 '24 01:05 Piedone

I think why hangs became more frequent, also generally, is because we switched from GitRunners to GitHub Windows runners after the former shut down. And the performance of these varies wildly, with even the build taking between 5 and 10 minutes. So, if we by chance get a worse runner, then tests will be slower too.

May 13 '24 20:05 Piedone

I think why hangs became more frequent, also generally, is because we switched from GitRunners to GitHub Windows runners after the former shut down. And the performance of these varies wildly, with even the build taking between 5 and 10 minutes. So, if we by chance get a worse runner, then tests will be slower too.

This is what I tried to point here and here.

May 13 '24 20:05 dministro

OK then!

May 13 '24 21:05 Piedone

Open-Source-Orchard-Core-Extensions Open-Source-Orchard-Core-Extensions copied to clipboard

Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834)

Open-Source-Orchard-Core-Extensions
Open-Source-Orchard-Core-Extensions copied to clipboard