ApplicationInsights-dotnet
ApplicationInsights-dotnet copied to clipboard
Flush doesn't block until messages are actually flushed
TelemetryClient.Flush() doesn't actually block returning until the messages are flushed. This means if you're want to flush messages during application shutdown (like in Microsoft/ApplicationInsights-aspnetcore#304) then you have to add a Thread.Sleep(1000) which isn't ideal.
It would be nice if there was a blocking version of this API that took an optional timeout, such that it only blocked for the time necessary to flush the events, or the timeout is hit, so that application shutdown doesn't have to be blocked any longer than necessary.
If the channel used is InMemoryChannel, then Flush() is blocking. But AI documentation suggests invoking Thread.Sleep() after flush during app shutdown.
We can modify the API to have both sync and async versions of the Flush() api, and update documentation accordingly.
@DamianEdwards Did you use InMemoryChannel? If yes, Flush() should be blocking and there would be no need of waiting afterwards. But for ServerTelemetryChannel, Flush() is async, and need to do some sleep after calling it.
We're seeing the same thing with our WebJobs/Functions implementation. We're looking for a reliable way to flush the client when our host exits.
This is really important IMHO. Exceptions that are about to crash your process (and hence require flushing) are arguably the most important ones to track.
In addition, it seems like the current implementation falls between two stools - on one hand it's synchronous so you block your thread (https://github.com/Microsoft/ApplicationInsights-dotnet/issues/482), but on the other hand delivery isn't guaranteed so it might as well have been asynchronous (ideally returning a Task you could wait on).
Like the OP said, as things stand now you have to add some random Thread.Sleep(N) where of course there is really no way to know what N should be...
@cijothomas @SergeyKanzhelev I wonder if something like this could be used as a workaround for the case of an exception about to crash the process?
try
{
// something important
}
catch (Exception fatal)
{
// retain original channel
var serverChannel = TelemetryConfiguration.Active.TelemetryChannel;
// switch to InMemoryChannel for guaranteed delivery of the offending exception
TelemetryConfiguration.Active.TelemetryChannel = new InMemoryChannel();
telemetryClient.TrackException(fatal);
telemetryClient.Flush();
// now that the most important part was tracked, optimistically try and drain the queue
serverChannel.Flush();
Thread.Sleep(1000);
// shutdown/crash process
throw;
}
@ohadschn this is what we are doing here: https://github.com/Microsoft/ApplicationInsights-dotnet-server/blob/75373e57dcf8d646c54ee188461c373f2cc98939/Src/WindowsServer/WindowsServer.Shared/UnhandledExceptionTelemetryModule.cs#L71-L86 So you can just use unhandled exceptions telemetry module
@SergeyKanzhelev This is great stuff, I'll make sure we use it. However, in the Service Fabric (SF) case I believe the module won't be enough as SF catches RunAsync exceptions and calls Environment.FailFast which won't trigger the unhandled exception handlers. So I basically copied the handler code to my top-most try/catch.
We also just ran into this - we lost some application crash exceptions. 😢
The current behavior doesn't seem intuitive/expected to me so I'd say this should be treated as a "bug" instead of an "enhancement".
Also, it's the most wanted issue in this repo, so it would be great if this could get some love soon!
@cwe1ss note that specifically for application crashes there is a pretty simple workaround, see Sergey's last comment for the code. Basically initialize a new TelemetryClient with an InMemoryChannel, use it for your TrackException, and finally Flush it. Delivery should be guaranteed (as long as the process didn't crash before flushing was complete of course).
@ohadschn yes, we will use this workaround for now.
But still, I think this is a very important issue as it's just frustrating for everyone who doesn't know about this behavior - which probably will be almost everyone.
I think it would be even better if TelemetryClient implemented IDisposable and that should automatically Flush as well (with some timeout of course). No one likes lost exceptions. This would also play nicer with dependency injection tools like Microsoft.Extensions.DependencyInjection that automatically dispose their services.
@cwe1ss In AI's defense, almost everyone will be covered by the unhandled exception module which is enabled by default in Microsoft.ApplicationInsights.WindowsServer and Microsoft.ApplicationInsights.Web (which depends on the former), by far the most prevalent AI packages (according to nuget.org statistics). Of course there are still many users of other packages, such as ASP.NET Core where this module won't be enabled by default. And there are cases (like Service Fabric) where unhandled exceptions won't reach the top stack frame and thus won't be caught by the module (plus there's no special exception handling in the AI SF package). In such cases, it's the developer's responsibility to implement this workaround.
As for TelemetryClient : IDisposable it makes sense for sure, though keep in mind that many developers probably have static telemetry clients (indeed, the official docs recommend "an instance of TelemetryClient for each module of your app") where it won't help much. Maybe AI should hook AppDomain.ProcessExit...
True. I'm in the latter group (ASP.NET Core, Service Fabric) so I'll probably just get hit by a lot more of these issues :(
@cwe1ss I am in the same group, so let's keep in touch if we run into related issues. In the meantime, might I suggest you upvote some of the issues here (and create new ones as applicable): https://github.com/Microsoft/ApplicationInsights-servicefabric/issues, https://github.com/Microsoft/ApplicationInsights-aspnetcore/issues.
I really want to see a blocking flush as well. We are using MSAI in a Win32 application via a C++/CLR library and would like to ensure that when the application exits we can ensure the buffered and persisted events are sent.
Stumbled upon this thread while considering adding AI to our Windows Service app. A Flush() that may return before messages are actually flushed to AI is misleading to me. In considering AI, my thoughts were:
- I'd like to use AI; I wonder if it'll block for network on every Track* call. That would be bad.
- Oh, there's a Flush() method, so it must be batching stuff. Good!
- That means I'll have to call Flush() myself right before the app closes.
This thread seems to say that calling Flush() probably won't accomplish that goal. @ohadschn mentioned, "In such cases, it's the developer's responsibility to implement this workaround," and my response to that is: "The developer's responsibility would be to call Flush(). Upon successful return of Flush() they have a reasonable expectation that they're not going to lose any AI messages."
That's my perspective on it, anyway. I agree with @cwe1ss that this is more of a bug than enhancement, and wanted to add a comment because the thread seems to be going the way of, "There's maybe some value here, but it's not really broken, and devs are responsible to use a [non-obviouse] workaround." I hope the perspective of someone new to AI helps.
The Flush() should be synchronous. A user can easily turn a synchronous call into an asynchronous one but not the other way around. Right now there is no way to guarantee that all messages are delivered.
The mentioning of Thread.Sleep(), even on Microsoft code samples, looks really unprofessional.
I up-vote this issue too. It is not intuitive for the async Flush to be named Flush instead of FlushAsync, let alone that we lack sync implementation too :(.
This is what happens when you give the intern too much responsibility 🤣
This should be the API:
Task FlushAsync(CancellationToken);
You can await it, Wait it or cancel it.
@paulomorgado that comment should go on this other issue: https://github.com/Microsoft/ApplicationInsights-dotnet/issues/482
Hi all, has this Flush() problem been fixed in the latest AI sdk release? We ran into the same problem with version 2.10.
nothing has changed in flush in this release. I expect some update by end of year.
I cloned the master branch from here and look into the source code of TelemetryClient.Flush() and looks like all the internal implementations are synchronous. Were there any reason why the code was not released yet or have I missed something inside Flush() that still makes it async?
I cloned the master branch from here and look into the source code of TelemetryClient.Flush() and looks like all the internal implementations are synchronous. Were there any reason why the code was not released yet or have I missed something inside Flush() that still makes it async?
InMemoryChannel is sync. Its the ServerTelemetryChannel which is not sync. It pushes items from in-memory queue to a Transmitor. The transmittor has its own queue of In-Flight transmissions, in-memory transmissions, in-disk transmissions - these are not flushed sync.
@cijothomas Could you look at this, please? I looked inside the ServerTelemetryChannel, and it seems it actually does wait for the asynchronous task to finish:
https://github.com/microsoft/ApplicationInsights-dotnet/blob/7633ae849edc826a8547745b6bf9f3174715d4bd/BASE/src/ServerTelemetryChannel/ServerTelemetryChannel.cs#L289-L298
What am I missing?
That still doesn't guarantee that the events are sent to the ingestion endpoint. The ServerTelemetryChannel has a Transmitter with its own buffering. TelemetryBuffer.FlushAsync() merely ensures that events are handed off to the Transmitter.
Any ETA on when this will be fixed?
Indeed, when will this be made possible. We are still using version 1.2.3 because it's the last version supporting a synchronous flush.
Indeed, when will this be made possible. We are still using version 1.2.3 because it's the last version supporting a synchronous flush.
I don't know much about 1.2.3 version - but its quite old and not supported anymore. To get sync Flush, InMemoryChannel can be used.
We started working on fixing this issue, but hit several challenges. Its still an open PR. Don't have ETA on when we'll be able to merge it.
Indeed, when will this be made possible. We are still using version 1.2.3 because it's the last version supporting a synchronous flush.
I don't know much about 1.2.3 version - but its quite old and not supported anymore. To get sync Flush, InMemoryChannel can be used.
We started working on fixing this issue, but hit several challenges. Its still an open PR. Don't have ETA on when we'll be able to merge it.
What does 'not supported' mean? All of our applications that use this version are happily send telemetry data to our endpoint on Azure.