Helix workitem results not getting updated
FR discussion
https://teams.microsoft.com/l/message/19:[email protected]/1752674760745?tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47&groupId=4d73664c-9f2f-450d-82a5-c2f02756606d&parentMessageId=1752674760745&teamName=.NET%20Core%20Eng%20Services%20Partners&channelName=First%20Responders&createdTime=1752674760745
Symptoms
Looking at the details (for example: https://helix.dot.net/api/jobs/85499675-db32-441d-a906-6d5a3ceb6649/details?api-version=2019-06-17)
you'll see a result like
"WorkItems":{"Unscheduled":221,"Waiting":0,"Running":0,"Finished":222,"
Impact
Low, it's likely that Helix is functioning properly, but because of the issue, reported results are inaccurate
Investigation
Looking in app insight (dotnet-eng) I see entries like this...
problemId: System.ArgumentNullException at Azure.Storage.Blobs.Specialized.BlobBaseClient..ctor
callstack: Azure.Storage.Blobs.Specialized.BlobBaseClient..ctor at :0
Azure.Storage.Blobs.Specialized.BlobBaseClient..ctor at :0
Azure.Storage.Blobs.BlobClient..ctor at :0
HelixAPI.Controllers.AzDevExtensionController+<GetDelegationSasToken>d__19.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:289
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<MapModel>d__16.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:237
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<>c__DisplayClass15_0+<<MapModel>b__2>d.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:221
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<MapModel>d__15.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:222
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<WorkItemResultData>d__14.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:210
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
lambda_method2961 at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor+TaskOfActionResultExecutor+<Execute>d__0.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker+<<InvokeActionMethodAsync>g__Logged|12_1>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker+<<InvokeNextActionFilterAsync>g__Awaited|10_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker+<<InvokeInnerFilterAsync>g__Awaited|13_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeNextResourceFilter>g__Awaited|25_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeFilterPipelineAsync>g__Awaited|20_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeAsync>g__Logged|17_1>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeAsync>g__Logged|17_1>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Routing.EndpointMiddleware+<<Invoke>g__AwaitRequestTask|7_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
HelixAPI.Shared.InternalUserExtensions+<>c+<<UseHelixUserInformation>b__0_0>d.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Shared\HelixDataService.cs:36
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Authorization.AuthorizationMiddleware+<Invoke>d__11.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Authentication.AuthenticationMiddleware+<Invoke>d__6.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
HelixAPI.ErrorHandlers+<HandleForStatusCodePages>d__1.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\ErrorHandlers.cs:36
Looking at the code, it seems that at this point, https://dev.azure.com/dnceng/internal/_git/dotnet-helix-service?path=/src/ServiceFabric/Helix/HelixAPI/Controllers/AzDevExtensionController.cs&version=GBmain&line=237&lineEnd=238&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents, the service has decided that the user is authenticated even though this is for a public PR. The result is that getting a delegation sas fails (obviously), and then the exception is thrown and the rest of the code doesn't execute.
Release Note Category
- [ ] Feature changes/additions
- [ ] Bug fixes
- [ ] Internal Infrastructure Improvements
Release Note Description
Job details for the hanging Helix result
{
"QueueId": "ubuntu.2204.amd64.open",
"JobList": "https://helixd1107v0xdeko0k025g8.blob.core.windows.net/helix-job-b5fae70f-9adb-4448-8c3c-0f79b240b083fa38dfc953a4da489/job-list-68bf364d-e1a0-4c77-9cea-303610785dbf.json?helixlogtype=result",
"WorkItems": {
"Unscheduled": 221,
"Waiting": 0,
"Running": 0,
"Finished": 222,
"ListUrl": "https://helix.dot.net/api/jobs/85499675-db32-441d-a906-6d5a3ceb6649/workitems?api-version=2019-06-17"
},
"Name": "85499675-db32-441d-a906-6d5a3ceb6649",
"Creator": "sdk-dev",
"Created": "2025-07-15T19:12:56.3480000+00:00",
"Finished": "2025-07-15T21:44:46.3130000+00:00",
"InitialWorkItemCount": 221,
"WaitUrl": "https://helix.dot.net/api/jobs/85499675-db32-441d-a906-6d5a3ceb6649/wait?api-version=2019-06-17",
"Source": "pr/public/dotnet/sdk/refs/pull/49749/merge",
"Type": "test/product/",
"Build": "",
"Properties": {
"System.JobId": "313dc793-1f35-51a8-9bbb-d1c21e4ca4e1",
"System.JobName": "__default",
"System.PhaseName": "ContainerBased_linux_x64_azureLinux30Net10BuildAmd64",
"System.StageAttempt": "2",
"CollectionUri": "https://dev.azure.com/dnceng-public/",
"System.PullRequest.TargetBranch": "main",
"DefinitionName": "dotnet-sdk-public-ci",
"System.JobAttempt": "2",
"Reason": "PullRequest",
"operatingSystem": "[email protected]/dotnet-buildtools/prereqs:ubuntu-24.04-helix-amd64",
"AzurePipelinesTestRunId": "29914744",
"System.PhaseAttempt": "2",
"BuildId": "1095283",
"BuildNumber": "20250715.25",
"Project": "public",
"DefinitionId": "101",
"System.StageName": "build"
},
"Errors": []
}
Job details for a "working" Helix result
{
"QueueId": "ubuntu.2204.amd64",
"JobList": "https://helixd0107v0xdeko0k025g8.blob.core.windows.net/helix-job-67975bd9-369e-4d07-9bad-98b5eddeb675d76d100216c407cb1/job-list-33ce586a-63f7-41df-9443-d9a580f34076.json?sv=2024-11-04&se=2025-08-16T18%3A45%3A17Z&sr=c&sp=rl&sig=I8lC4613WN2WP0a2p9mtbYTRsIdPgk%2BMZCxWRohWxzw%3D",
"WorkItems": {
"Unscheduled": 0,
"Waiting": 0,
"Running": 0,
"Finished": 222,
"ListUrl": "https://helix.dot.net/api/jobs/19a63f02-e413-49d8-b37f-b9b8aa0f644f/workitems?api-version=2019-06-17"
},
"Name": "19a63f02-e413-49d8-b37f-b9b8aa0f644f",
"Creator": "dotnet-bot",
"Created": "2025-07-17T18:46:07.3470000+00:00",
"Finished": "2025-07-17T19:23:30.9400000+00:00",
"InitialWorkItemCount": 221,
"WaitUrl": "https://helix.dot.net/api/jobs/19a63f02-e413-49d8-b37f-b9b8aa0f644f/wait?api-version=2019-06-17",
"Source": "official/internal/dotnet-sdk/refs/heads/main",
"Type": "test/product/",
"Build": "",
"Properties": {
"DefinitionId": "140",
"System.JobAttempt": "1",
"BuildId": "2752338",
"AzurePipelinesTestRunId": "55498705",
"DefinitionName": "dotnet-sdk-official-ci",
"BuildNumber": "20250717.4",
"System.StageName": "build",
"System.StageAttempt": "1",
"System.JobName": "__default",
"operatingSystem": "ubuntu.2204.amd64",
"CollectionUri": "https://dev.azure.com/dnceng/",
"System.JobId": "d68b235e-d7af-5bf6-c242-f1e29d5991d8",
"System.PhaseName": "TestBuild_linux_x64",
"System.PhaseAttempt": "1",
"Reason": "BatchedCI",
"Project": "internal"
},
"Errors": []
}
However, i also see quite a few instances where it appears that the job completed reasonably, but it is exhibiting the behavior where it shows a large number of "unscheduled" jobs.
{
"QueueId": "ubuntu.2204.amd64.open.svc",
"JobList": "https://helixd1107v0xd1eu3ibi6ka.blob.core.windows.net/helix-job-4b653eb7-9e6f-40e7-9911-18ef393559c540cca69242b430c9e/job-list-403edca7-14f7-4454-9803-74bfba934a82.json?helixlogtype=result",
"WorkItems": {
"Unscheduled": 19,
"Waiting": 0,
"Running": 0,
"Finished": 20,
"ListUrl": "https://helix.dot.net/api/jobs/64bf5d8c-8954-4f82-8e6b-c19733e54412/workitems?api-version=2019-06-17"
},
"Name": "64bf5d8c-8954-4f82-8e6b-c19733e54412",
"Creator": "mmitche",
"Created": "2025-07-16T18:51:39.8120000+00:00",
"Finished": "2025-07-16T19:02:15.4860000+00:00",
"InitialWorkItemCount": 19,
"WaitUrl": "https://helix.dot.net/api/jobs/64bf5d8c-8954-4f82-8e6b-c19733e54412/wait?api-version=2019-06-17",
"Source": "pr/public/dotnet/arcade/refs/pull/15973/merge",
"Type": "test/product/",
"Build": "",
"Properties": {
"System.StageAttempt": "1",
"System.PhaseName": "Windows_NT",
"System.StageName": "Test",
"System.JobId": "21143b0a-ace1-5058-215e-b11404388d17",
"BuildId": "1096804",
"Reason": "PullRequest",
"DefinitionName": "arcade-pr",
"System.JobAttempt": "1",
"CollectionUri": "https://dev.azure.com/dnceng-public/",
"System.JobName": "Build_Release",
"AzurePipelinesTestRunId": "29977956",
"operatingSystem": "(Centos.7.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-7-mlnet-helix",
"Project": "public",
"System.PhaseAttempt": "1",
"DefinitionId": "283",
"System.PullRequest.TargetBranch": "release/6.0",
"BuildNumber": "20250716.4"
},
"Errors": []
}
i continue to notice that the hanged job has System.JobAttempt > 1. Continuing to search for a common pattern
@chcosta please update to reflect the current state of the world.