brave
brave copied to clipboard
ArrayIndexOutOfBoundsException from MutableSpan
Describe the Bug
When adding tags to an ongoing span, we occasionally see the following exception:
java.lang.ArrayIndexOutOfBoundsException: Index 4 out of bounds for length 4
at brave.handler.MutableSpan.tag(MutableSpan.java:883)
at brave.RealSpan.tag(RealSpan.java:112)
at brave.SpanCustomizerShield.tag(SpanCustomizerShield.java:35)
at brave.CurrentSpanCustomizer.tag(CurrentSpanCustomizer.java:41)
I believe this occurs in rare race-conditions in which some tags are removed while others are being added. I will attempt to reproduce this with a test, but I'm not having any luck so far. Wanted to create this issue and see if others are seeing anything similar.
Hey @karkum , I would like to work on this issue. Can you assign it on me?
Hello @karkum is anyone working in this?
I think @reta is looking to remove synchronized for a different concurrent pattern, so whatever is missing here probably a good time to fix it without resurrecting synchronized ;)
so currently it should be guarded by this in RealSpan
@Override public Span tag(String key, String value) {
synchronized (state) {
state.tag(key, value);
}
return this;
}
ScopedSpan is unguarded, but that's also documented as not thread safe.
The culprit is Tag.tag which accesses the mutable span without synchronizing it. @reta lemme know if you want me to add synchronized to undo later, or if you feel you're close to removing synchronized. I kindof lean towards doing synchronized and releasing a patch to fix it..
@codefromthecrypt just picked https://github.com/openzipkin/brave/issues/1426 up, will address this issue as well, thanks a lot for figuring out the cause (and the fix).
I kindof lean towards doing synchronized and releasing a patch to fix it..
I was looking into Tag.tag
, particularly this snippet:
if (span instanceof SpanCustomizer) {
((SpanCustomizer) span).tag(key, value);
} else if (span instanceof MutableSpan) {
((MutableSpan) span).tag(key, value);
}
I think it is rather difficult to fix it it just with synchronized
or any kind of locking here: yes, we could synchronize over span
instance (I think this is your idea), but the MutableSpan.tag
could still be accessed directly and this access will be unguarded. Does it make sense? Thank you.
might have been a red herring.. for example, a user cannot access the mutable span until it is inside a span handler, which should already be locked at that point. sorry if I led you down the wrong path.
Basically we don't want to lock fine grained mutablespan as it will explode overhead in ways maybe our benchmarks aren't written to show. If we locked internally in mutablespan, there are existing code which iterate through all things and update many things. So, better to change nothing until sure.
Thanks @codefromthecrypt
might have been a red herring.. for example, a user cannot access the mutable span until it is inside a span handler, which should already be locked at that point. sorry if I led you down the wrong path.
Aha I see, will look more closely into it (definitely not suggesting we locked internally in mutablespan at the moment, I agree)