newrelic-dotnet-agent
newrelic-dotnet-agent copied to clipboard
.NET Core agent w/ Send Data on Exit can crash application during shutdown
Description
.NET Core applications with Send Data on Exit enabled, can cause applications to crash during shutdown. This has been observed when the collector returns a HTTP 409
response to metric_data
being sent during application shutdown due to the Send Data on Exit option being enabled. In this scenario, the .NET Agent attempts to reconnect to send data, but encounters an exception in the process.
Here is a stack trace from our .NET Core Infinite Tracing tests:
Fatal error. Internal CLR error. (0x80131506)
Repeat 2 times:
--------------------------------
at System.Diagnostics.Tracing.EventPipeInternal.Enable(Char*, System.Diagnostics.Tracing.EventPipeSerializationFormat, UInt32, EventPipeProviderConfigurationNative*, UInt32)
--------------------------------
at System.Diagnostics.Tracing.EventPipeInternal.Enable(System.String, System.Diagnostics.Tracing.EventPipeSerializationFormat, UInt32, System.Diagnostics.Tracing.EventPipeProviderConfiguration[])
at System.Diagnostics.Tracing.EventPipeEventDispatcher.CommitDispatchConfiguration()
at System.Diagnostics.Tracing.EventPipeEventDispatcher.SendCommand(System.Diagnostics.Tracing.EventListener, System.Diagnostics.Tracing.EventCommand, Boolean, System.Diagnostics.Tracing.EventLevel, System.Diagnostics.Tracing.EventKeywords)
at System.Diagnostics.Tracing.EventListener.EnableEvents(System.Diagnostics.Tracing.EventSource, System.Diagnostics.Tracing.EventLevel, System.Diagnostics.Tracing.EventKeywords, System.Collections.Generic.IDictionary`2<System.String,System.String>)
at NewRelic.Agent.Core.Samplers.GCEventsListener.StartListening()
at NewRelic.Agent.Core.Samplers.GCSamplerNetCore.Start()
at NewRelic.Agent.Core.Utilities.EventBus`1[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Publish(System.__Canon)
at NewRelic.Agent.Core.Configuration.ConfigurationService.OnServerConfigurationUpdated(NewRelic.Agent.Core.Events.ServerConfigurationUpdatedEvent)
at NewRelic.Agent.Core.Utilities.EventBus`1[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Publish(System.__Canon)
at NewRelic.Agent.Core.DataTransport.ConnectionManager.Disconnect()
at NewRelic.Agent.Core.DataTransport.ConnectionManager.Reconnect()
at NewRelic.Agent.Core.Utilities.EventBus`1[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Publish(System.__Canon)
at NewRelic.Agent.Core.DataTransport.DataTransportService.TrySendDataRequest[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.String, System.Object[])
at NewRelic.Agent.Core.DataTransport.DataTransportService.Send(System.Collections.Generic.IEnumerable`1<NewRelic.Agent.Core.WireModels.MetricWireModel>)
at NewRelic.Agent.Core.Aggregators.MetricAggregator.Harvest()
at NewRelic.Agent.Core.Aggregators.AbstractAggregator`1[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].OnPreCleanShutdown(NewRelic.Agent.Core.Events.PreCleanShutdownEvent)
at NewRelic.Agent.Core.Utilities.EventBus`1[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Publish(System.__Canon)
at NewRelic.Agent.Core.AgentManager.Shutdown(Boolean)
Here are the agent logs when the Internal CLR error takes place:
2022-06-15 16:30:55,418 NewRelic DEBUG: [pid: 27472, tid: 2] Received a ProcessExit CLR event for the application domain. About to shut down the .NET Agent...
2022-06-15 16:30:55,422 NewRelic DEBUG: [pid: 27472, tid: 2] Starting the shutdown process for the .NET Agent.
2022-06-15 16:30:55,428 NewRelic INFO: [pid: 27472, tid: 2] Metric harvest starting.
2022-06-15 16:30:55,434 NewRelic DEBUG: [pid: 27472, tid: 2] Invoking "metric_data" with : ["bleep": "bloop"]
2022-06-15 16:30:55,600 NewRelic ERROR: [pid: 27472, tid: 2] Received HTTP status code Conflict with message
2022-06-15 16:30:55,602 NewRelic DEBUG: [pid: 27472, tid: 2] Received a 409 Conflict response invoking method "metric_data"
2022-06-15 16:30:55,684 NewRelic ERROR: [pid: 27472, tid: 2] NewRelic.Agent.Core.Exceptions.HttpException: Conflict
at NewRelic.Agent.Core.DataTransport.HttpCollectorWire.ThrowExceptionFromHttpWebResponse(String serializedData, HttpWebResponse response)
at NewRelic.Agent.Core.DataTransport.HttpCollectorWire.SendData(String method, ConnectionInfo connectionInfo, String serializedData)
at NewRelic.Agent.Core.DataTransport.ConnectionHandler.SendDataOverWire[T](ICollectorWire wire, String method, Object[] data)
at NewRelic.Agent.Core.DataTransport.ConnectionHandler.SendDataRequest[T](String method, Object[] data)
at NewRelic.Agent.Core.DataTransport.ConnectionManager.SendDataRequest[T](String method, Object[] data)
at NewRelic.Agent.Core.DataTransport.DataTransportService.TrySendDataRequest[T](String method, Object[] data)
2022-06-15 16:30:55,688 NewRelic DEBUG: [pid: 27472, tid: 2] Invoking "shutdown" with : []
2022-06-15 16:30:55,821 NewRelic DEBUG: [pid: 27472, tid: 2] Received : {"return_value":[]}
2022-06-15 16:30:55,823 NewRelic FINEST: [pid: 27472, tid: 2] Sampler NewRelic.Agent.Core.Samplers.GCSamplerNetCore has been requested to stop.
Expected Behavior Agent gracefully shuts down regardless of collector responses.
Steps to Reproduce
- Enable send data on exit
- Configure the collector to send a 409 status code for
metric_data
during the send data on exit event harvest
Your Environment .NET Core 5 application. Windows. I have seen it happen on multiple versions of .NET Core though.
For Maintainers Only or Hero Triaging this bug Suggested Priority (P1,P2,P3,P4,P5): P4 Suggested T-Shirt size (S, M, L, XL, Unknown): M