dnceng icon indicating copy to clipboard operation
dnceng copied to clipboard

Helix workitem results not getting updated

Open chcosta opened this issue 5 months ago • 2 comments

FR discussion

https://teams.microsoft.com/l/message/19:[email protected]/1752674760745?tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47&groupId=4d73664c-9f2f-450d-82a5-c2f02756606d&parentMessageId=1752674760745&teamName=.NET%20Core%20Eng%20Services%20Partners&channelName=First%20Responders&createdTime=1752674760745

Symptoms

Looking at the details (for example: https://helix.dot.net/api/jobs/85499675-db32-441d-a906-6d5a3ceb6649/details?api-version=2019-06-17)

you'll see a result like

"WorkItems":{"Unscheduled":221,"Waiting":0,"Running":0,"Finished":222,"

Impact

Low, it's likely that Helix is functioning properly, but because of the issue, reported results are inaccurate

Investigation

Looking in app insight (dotnet-eng) I see entries like this...

problemId: System.ArgumentNullException at Azure.Storage.Blobs.Specialized.BlobBaseClient..ctor

callstack: Azure.Storage.Blobs.Specialized.BlobBaseClient..ctor at :0
Azure.Storage.Blobs.Specialized.BlobBaseClient..ctor at :0
Azure.Storage.Blobs.BlobClient..ctor at :0
HelixAPI.Controllers.AzDevExtensionController+<GetDelegationSasToken>d__19.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:289
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<MapModel>d__16.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:237
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<>c__DisplayClass15_0+<<MapModel>b__2>d.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:221
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<MapModel>d__15.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:222
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
HelixAPI.Controllers.AzDevExtensionController+<WorkItemResultData>d__14.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Controllers\AzDevExtensionController.cs:210
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
System.Runtime.CompilerServices.TaskAwaiter`1.GetResult at :0
lambda_method2961 at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor+TaskOfActionResultExecutor+<Execute>d__0.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker+<<InvokeActionMethodAsync>g__Logged|12_1>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker+<<InvokeNextActionFilterAsync>g__Awaited|10_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker+<<InvokeInnerFilterAsync>g__Awaited|13_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeNextResourceFilter>g__Awaited|25_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeFilterPipelineAsync>g__Awaited|20_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeAsync>g__Logged|17_1>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker+<<InvokeAsync>g__Logged|17_1>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Routing.EndpointMiddleware+<<Invoke>g__AwaitRequestTask|7_0>d.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
HelixAPI.Shared.InternalUserExtensions+<>c+<<UseHelixUserInformation>b__0_0>d.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\Shared\HelixDataService.cs:36
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Authorization.AuthorizationMiddleware+<Invoke>d__11.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
Microsoft.AspNetCore.Authentication.AuthenticationMiddleware+<Invoke>d__6.MoveNext at :0
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw at :0
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess at :0
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification at :0
HelixAPI.ErrorHandlers+<HandleForStatusCodePages>d__1.MoveNext at S:\src\ServiceFabric\Helix\HelixAPI\ErrorHandlers.cs:36

Looking at the code, it seems that at this point, https://dev.azure.com/dnceng/internal/_git/dotnet-helix-service?path=/src/ServiceFabric/Helix/HelixAPI/Controllers/AzDevExtensionController.cs&version=GBmain&line=237&lineEnd=238&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents, the service has decided that the user is authenticated even though this is for a public PR. The result is that getting a delegation sas fails (obviously), and then the exception is thrown and the rest of the code doesn't execute.

Release Note Category

  • [ ] Feature changes/additions
  • [ ] Bug fixes
  • [ ] Internal Infrastructure Improvements

Release Note Description

chcosta avatar Jul 17 '25 18:07 chcosta

Job details for the hanging Helix result

{
  "QueueId": "ubuntu.2204.amd64.open",
  "JobList": "https://helixd1107v0xdeko0k025g8.blob.core.windows.net/helix-job-b5fae70f-9adb-4448-8c3c-0f79b240b083fa38dfc953a4da489/job-list-68bf364d-e1a0-4c77-9cea-303610785dbf.json?helixlogtype=result",
  "WorkItems": {
    "Unscheduled": 221,
    "Waiting": 0,
    "Running": 0,
    "Finished": 222,
    "ListUrl": "https://helix.dot.net/api/jobs/85499675-db32-441d-a906-6d5a3ceb6649/workitems?api-version=2019-06-17"
  },
  "Name": "85499675-db32-441d-a906-6d5a3ceb6649",
  "Creator": "sdk-dev",
  "Created": "2025-07-15T19:12:56.3480000+00:00",
  "Finished": "2025-07-15T21:44:46.3130000+00:00",
  "InitialWorkItemCount": 221,
  "WaitUrl": "https://helix.dot.net/api/jobs/85499675-db32-441d-a906-6d5a3ceb6649/wait?api-version=2019-06-17",
  "Source": "pr/public/dotnet/sdk/refs/pull/49749/merge",
  "Type": "test/product/",
  "Build": "",
  "Properties": {
    "System.JobId": "313dc793-1f35-51a8-9bbb-d1c21e4ca4e1",
    "System.JobName": "__default",
    "System.PhaseName": "ContainerBased_linux_x64_azureLinux30Net10BuildAmd64",
    "System.StageAttempt": "2",
    "CollectionUri": "https://dev.azure.com/dnceng-public/",
    "System.PullRequest.TargetBranch": "main",
    "DefinitionName": "dotnet-sdk-public-ci",
    "System.JobAttempt": "2",
    "Reason": "PullRequest",
    "operatingSystem": "[email protected]/dotnet-buildtools/prereqs:ubuntu-24.04-helix-amd64",
    "AzurePipelinesTestRunId": "29914744",
    "System.PhaseAttempt": "2",
    "BuildId": "1095283",
    "BuildNumber": "20250715.25",
    "Project": "public",
    "DefinitionId": "101",
    "System.StageName": "build"
  },
  "Errors": []
}

Job details for a "working" Helix result

{
  "QueueId": "ubuntu.2204.amd64",
  "JobList": "https://helixd0107v0xdeko0k025g8.blob.core.windows.net/helix-job-67975bd9-369e-4d07-9bad-98b5eddeb675d76d100216c407cb1/job-list-33ce586a-63f7-41df-9443-d9a580f34076.json?sv=2024-11-04&se=2025-08-16T18%3A45%3A17Z&sr=c&sp=rl&sig=I8lC4613WN2WP0a2p9mtbYTRsIdPgk%2BMZCxWRohWxzw%3D",
  "WorkItems": {
    "Unscheduled": 0,
    "Waiting": 0,
    "Running": 0,
    "Finished": 222,
    "ListUrl": "https://helix.dot.net/api/jobs/19a63f02-e413-49d8-b37f-b9b8aa0f644f/workitems?api-version=2019-06-17"
  },
  "Name": "19a63f02-e413-49d8-b37f-b9b8aa0f644f",
  "Creator": "dotnet-bot",
  "Created": "2025-07-17T18:46:07.3470000+00:00",
  "Finished": "2025-07-17T19:23:30.9400000+00:00",
  "InitialWorkItemCount": 221,
  "WaitUrl": "https://helix.dot.net/api/jobs/19a63f02-e413-49d8-b37f-b9b8aa0f644f/wait?api-version=2019-06-17",
  "Source": "official/internal/dotnet-sdk/refs/heads/main",
  "Type": "test/product/",
  "Build": "",
  "Properties": {
    "DefinitionId": "140",
    "System.JobAttempt": "1",
    "BuildId": "2752338",
    "AzurePipelinesTestRunId": "55498705",
    "DefinitionName": "dotnet-sdk-official-ci",
    "BuildNumber": "20250717.4",
    "System.StageName": "build",
    "System.StageAttempt": "1",
    "System.JobName": "__default",
    "operatingSystem": "ubuntu.2204.amd64",
    "CollectionUri": "https://dev.azure.com/dnceng/",
    "System.JobId": "d68b235e-d7af-5bf6-c242-f1e29d5991d8",
    "System.PhaseName": "TestBuild_linux_x64",
    "System.PhaseAttempt": "1",
    "Reason": "BatchedCI",
    "Project": "internal"
  },
  "Errors": []
}

However, i also see quite a few instances where it appears that the job completed reasonably, but it is exhibiting the behavior where it shows a large number of "unscheduled" jobs.

{
  "QueueId": "ubuntu.2204.amd64.open.svc",
  "JobList": "https://helixd1107v0xd1eu3ibi6ka.blob.core.windows.net/helix-job-4b653eb7-9e6f-40e7-9911-18ef393559c540cca69242b430c9e/job-list-403edca7-14f7-4454-9803-74bfba934a82.json?helixlogtype=result",
  "WorkItems": {
    "Unscheduled": 19,
    "Waiting": 0,
    "Running": 0,
    "Finished": 20,
    "ListUrl": "https://helix.dot.net/api/jobs/64bf5d8c-8954-4f82-8e6b-c19733e54412/workitems?api-version=2019-06-17"
  },
  "Name": "64bf5d8c-8954-4f82-8e6b-c19733e54412",
  "Creator": "mmitche",
  "Created": "2025-07-16T18:51:39.8120000+00:00",
  "Finished": "2025-07-16T19:02:15.4860000+00:00",
  "InitialWorkItemCount": 19,
  "WaitUrl": "https://helix.dot.net/api/jobs/64bf5d8c-8954-4f82-8e6b-c19733e54412/wait?api-version=2019-06-17",
  "Source": "pr/public/dotnet/arcade/refs/pull/15973/merge",
  "Type": "test/product/",
  "Build": "",
  "Properties": {
    "System.StageAttempt": "1",
    "System.PhaseName": "Windows_NT",
    "System.StageName": "Test",
    "System.JobId": "21143b0a-ace1-5058-215e-b11404388d17",
    "BuildId": "1096804",
    "Reason": "PullRequest",
    "DefinitionName": "arcade-pr",
    "System.JobAttempt": "1",
    "CollectionUri": "https://dev.azure.com/dnceng-public/",
    "System.JobName": "Build_Release",
    "AzurePipelinesTestRunId": "29977956",
    "operatingSystem": "(Centos.7.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-7-mlnet-helix",
    "Project": "public",
    "System.PhaseAttempt": "1",
    "DefinitionId": "283",
    "System.PullRequest.TargetBranch": "release/6.0",
    "BuildNumber": "20250716.4"
  },
  "Errors": []
}

i continue to notice that the hanged job has System.JobAttempt > 1. Continuing to search for a common pattern

chcosta avatar Jul 17 '25 21:07 chcosta

@chcosta please update to reflect the current state of the world.

missymessa avatar Nov 06 '25 21:11 missymessa