cpp_client_telemetry
cpp_client_telemetry copied to clipboard
CPU usage crash on calling the flush API
Describe your environment. Describe any aspect of your environment relevant to the problem, including your SDK version, platform, OS version, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main brach.
OneDS SDK Version: 3.6.187
Steps to reproduce.
Describe exactly how to reproduce the error. Include a code sample if applicable.
This is mainly happening on the [ODWLogManager flush] call with lock wait. Not able to repro. But we are getting reports from internal users. PFA for the symbolicated logs.

What is the expected behavior? What did you expect to see? No crash due to CPU usage
What is the actual behavior? What did you see instead? Seeing CPU usage crashes on the flush API call
Additional context. Add any other context about the problem here. Crash.txt
Related issue: #754
@lalitb - although may not related, I noticed that about 1.5 years ago there was a change in the shutdown sequence (zombie loggers-related), that could cause the SDK to get into state where it waits for upload, but doesn't actually upload anything. Although that should not cause any negative consequences, it'd appear that the app is "frozen" while idle-waiting for the flush and teardown timer to expire.. I hit it in some tests with my product and I might have a patch that allows to push-thru event faster, thus exit the app faster without waiting for the entire flush and teardown duration.
@maxgolov Is the patch part of the OneDS SDK or the client app? Could you share more details on this?
When I check the crash logs, it looks like there is this pauseTransmission waiting for a lock to be released
These are the related bugs:
https://github.com/microsoft/cpp_client_telemetry/issues/1047 https://github.com/microsoft/cpp_client_telemetry/issues/1077 output.txt
@lalitb @maxgolov Gentle reminder on this
@lalitb @nishchith-cp - I don't think it was actually the same issue. The bug that I'm observing is spinning in FlushAndTeardown without uploading anything. In my case the code doesn't get stuck on file write, and doesn't stuck on Flush. The associated bug is #1120 . Sorry for confusion.