zipkin4net
zipkin4net copied to clipboard
The timer in MutableSpanMap have not proper handling span data.
I have catch this exception in my product env:
Unhandled exception. System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at zipkin4net.Tracers.Zipkin.WriterExtensions.WriteList[U](StreamWriter writer, SerializeMethod`1 serializer, String fieldName, ICollection`1 elements, IPEndPoint endPoint, String serviceName)
at zipkin4net.Tracers.Zipkin.JSONSpanSerializer.SerializeSpan(StreamWriter writer, Span span)
at zipkin4net.Tracers.Zipkin.JSONSpanSerializer.SerializeTo(Stream stream, Span span)
at OpenPlatform.AspNetCore.Infrastructure.CusJSONSpanSerializer.SerializeTo(Stream stream, Span span) in /src/PlaformServices/Libraries/OpenPlatform.AspNetCore/Infrastructure/ZipkinStartup.cs:line 88
at zipkin4net.Tracers.Zipkin.ZipkinTracerReporter.Report(Span span)
at zipkin4net.Internal.Recorder.MutableSpanMap.FlushOldSpans(DateTime utcNow)
at zipkin4net.Internal.Recorder.MutableSpanMap.<.ctor>b__6_0(Object _)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.TimerQueueTimer.CallCallback(Boolean isThreadPool)
at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool)
at System.Threading.TimerQueue.FireNextTimers()
I review the source code . and i know this is the problem of the System.Threading.Timer. This timer won't wait the pre task to finish ,and will start next task. That is the problem.
To verification this, I output the span when exception raise. This is show below:
"spanState": {
"traceIdHigh": 0,
"traceId": -7806682531417472614,
"parentSpanId": 6104606790270792834,
"spanId": -3199394417785887997,
"samplingStatus": 1,
"flags": 0,
"sampled": true,
"debug": false,
"extra": []
},
"annotations": [
{
"value": "sr",
"timestamp": "2021-12-13T09:44:18.4564427Z"
},
{
"value": "flush.timeout",
"timestamp": "2021-12-13T09:49:14.2332799Z"
},
{
"value": "flush.timeout",
"timestamp": "2021-12-13T09:49:14.2332922Z"
}
],
.....
There are two "flush.timeout" annotations, their timestamp are very close.
Maybe you should consider when to trigger the timer task.Please fix it.