cpp_client_telemetry
cpp_client_telemetry copied to clipboard
Pause Manager causes events not being uploaded on FlushAndTeardown
Problem Description
It seems like a change from 2021 is affecting the FlushAndTeardown
sequence:
https://github.com/microsoft/cpp_client_telemetry/pull/829
Steps to reproduce.
- Set
maxTeardownUploadTimeInSec
to 2-3-5 seconds. - Run a very short-lived testapp.
- Observe the
FlushAndTeardown
method spinning aimlessly and not actually uploading.
What is the expected behavior?
Expected behavior is that the 2-3 seconds are spent on trying to upload pending requests. Once there are zero events left, app should unblock and exit. Ideally with evt_stats
event turned off, it should be relatively fast exit. Why need to turn evt_stats
off - it's an oversubscribed event that'd generate 401/403, thus it's a waste. And should be removed out of the equation.
What is the actual behavior?
SDK spins and doesn't upload anything at the end of the run.
Additional context.
I suspect these pieces of code related to PauseGuard
are causing the problem:
https://github.com/microsoft/cpp_client_telemetry/blob/0177277d1dc194a7703ca87d40587b6148608b45/lib/tpm/TransmissionPolicyManager.cpp#L105
Because activity is paused here at the beginning of the FlushAndTeardown
method, without actually waiting for events cache to be fully drained:
https://github.com/microsoft/cpp_client_telemetry/blob/0177277d1dc194a7703ca87d40587b6148608b45/lib/api/LogManagerImpl.cpp#L380
Instead of pausing - it's supposed to wait for the amount of time configured. Obviously FlushAndTeardown
can't be called on mobile platforms on the main thread. It has to be called on the background thread. Calling this on foreground thread may cause ANR, and app getting killed.
The way it stands, it seems like short-lived apps and processes would experience "data loss", because their events get stuck and not attempted to be uploaded at the end of the short-lived app run.