cpp_client_telemetry icon indicating copy to clipboard operation
cpp_client_telemetry copied to clipboard

CPU usage crash on calling the flush API

Open nishchith-cp opened this issue 2 years ago • 5 comments
trafficstars

Describe your environment. Describe any aspect of your environment relevant to the problem, including your SDK version, platform, OS version, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main brach.

OneDS SDK Version: 3.6.187

Steps to reproduce. Describe exactly how to reproduce the error. Include a code sample if applicable. This is mainly happening on the [ODWLogManager flush] call with lock wait. Not able to repro. But we are getting reports from internal users. PFA for the symbolicated logs. Screenshot 2023-01-20 at 1 01 55 PM

What is the expected behavior? What did you expect to see? No crash due to CPU usage

What is the actual behavior? What did you see instead? Seeing CPU usage crashes on the flush API call

Additional context. Add any other context about the problem here. Crash.txt

nishchith-cp avatar Jan 20 '23 07:01 nishchith-cp

Related issue: #754

lalitb avatar Jan 20 '23 19:01 lalitb

@lalitb - although may not related, I noticed that about 1.5 years ago there was a change in the shutdown sequence (zombie loggers-related), that could cause the SDK to get into state where it waits for upload, but doesn't actually upload anything. Although that should not cause any negative consequences, it'd appear that the app is "frozen" while idle-waiting for the flush and teardown timer to expire.. I hit it in some tests with my product and I might have a patch that allows to push-thru event faster, thus exit the app faster without waiting for the entire flush and teardown duration.

maxgolov avatar Feb 14 '23 04:02 maxgolov

@maxgolov Is the patch part of the OneDS SDK or the client app? Could you share more details on this?

When I check the crash logs, it looks like there is this pauseTransmission waiting for a lock to be released

These are the related bugs:

https://github.com/microsoft/cpp_client_telemetry/issues/1047 https://github.com/microsoft/cpp_client_telemetry/issues/1077 output.txt

nishchith-cp avatar Feb 21 '23 09:02 nishchith-cp

@lalitb @maxgolov Gentle reminder on this

nishchith-cp avatar Mar 03 '23 04:03 nishchith-cp

@lalitb @nishchith-cp - I don't think it was actually the same issue. The bug that I'm observing is spinning in FlushAndTeardown without uploading anything. In my case the code doesn't get stuck on file write, and doesn't stuck on Flush. The associated bug is #1120 . Sorry for confusion.

maxgolov avatar Sep 13 '23 02:09 maxgolov