arcade icon indicating copy to clipboard operation
arcade copied to clipboard

Test Reporting failing to complete because of performance of dnceng-public

Open missymessa opened this issue 3 years ago • 4 comments

From @karelz

Folks,

I have a Kusto query which does not report all results as they can be seen in Runfo. Runfo query shows failures 9/7-9/22 in last 30 days, while Kusto has only the 9/22 failure.

Any idea why is there such discrepancy?

Thanks! -Karel

cluster('engsrvprod.kusto.windows.net').database('engineeringdata').AzureDevOpsTests | where TestName contains 'FileSystemWatcher_Directory_Delete_MultipleFilters' | distinct JobId, WorkItemId, Message, StackTrace, TestName, Arguments, Outcome | join kind=inner (cluster('engsrvprod.kusto.windows.net').database('engineeringdata').Jobs //| where ((Branch == 'refs/heads/main') or (Branch == 'refs/heads/master') or (includePR and (Source startswith "pr/"))) | where Type startswith "test/functional/cli/" and not(Properties contains "runtime-staging") | summarize arg_max(Finished, Properties, Type, Branch, Source, Started, QueueName) by JobId | project-rename JobType = Type) on JobId;

missymessa avatar Sep 23 '22 17:09 missymessa

It looks like the switch to the new project has broken everything. The performance is SO slow that we spent four hours trying to process build 23975, and were unable to finish before something killed us. That's not suprizing, 4 hours LONG time.

But it also means we are likely losing a LOT of test results.... And given Karel's sample set, I would say we are losing most test results. I'm going to escalate this to FR and mark it critical. The performance problems with the new cluster need to get resolved.

ChadNedzlek avatar Sep 23 '22 17:09 ChadNedzlek

@alexperovich, do you know the best route to get this escalated?

ChadNedzlek avatar Sep 23 '22 17:09 ChadNedzlek

cc/ @karelz for visibility

Also, thoughts @garath on how we could proactively report/alert on this in the future?

markwilkie avatar Sep 23 '22 17:09 markwilkie

The only thing to do is open an ICM I think. I don't know what is configured different between these 2 orgs.

alexperovich avatar Sep 23 '22 20:09 alexperovich

Hi Chad Nedzlek we have increased the database ServiceObjective from BC_GEN5_32 to BC_GEN5_80 to help improve the performance. Hopefully you should see improvement here.

Hopefully this will fix it?

ChadNedzlek avatar Sep 26 '22 16:09 ChadNedzlek

I think this got fixed. @AlitzelMendez said that our times have come down significantly since the update. I'm going to close this and we can keep an eye out for other issues.

ChadNedzlek avatar Oct 06 '22 20:10 ChadNedzlek