brave icon indicating copy to clipboard operation
brave copied to clipboard

ArrayIndexOutOfBoundsException from MutableSpan

Open karkum opened this issue 1 year ago • 1 comments

Describe the Bug

When adding tags to an ongoing span, we occasionally see the following exception:

java.lang.ArrayIndexOutOfBoundsException: Index 4 out of bounds for length 4
  at brave.handler.MutableSpan.tag(MutableSpan.java:883)
  at brave.RealSpan.tag(RealSpan.java:112)
  at brave.SpanCustomizerShield.tag(SpanCustomizerShield.java:35)
  at brave.CurrentSpanCustomizer.tag(CurrentSpanCustomizer.java:41)

I believe this occurs in rare race-conditions in which some tags are removed while others are being added. I will attempt to reproduce this with a test, but I'm not having any luck so far. Wanted to create this issue and see if others are seeing anything similar.

karkum avatar May 10 '23 19:05 karkum

Hey @karkum , I would like to work on this issue. Can you assign it on me?

Saravana-PS avatar May 14 '23 10:05 Saravana-PS

Hello @karkum is anyone working in this?

SomayajuluSharma avatar Mar 31 '24 03:03 SomayajuluSharma

I think @reta is looking to remove synchronized for a different concurrent pattern, so whatever is missing here probably a good time to fix it without resurrecting synchronized ;)

codefromthecrypt avatar Mar 31 '24 11:03 codefromthecrypt

so currently it should be guarded by this in RealSpan

  @Override public Span tag(String key, String value) {
    synchronized (state) {
      state.tag(key, value);
    }
    return this;
  }

ScopedSpan is unguarded, but that's also documented as not thread safe.

The culprit is Tag.tag which accesses the mutable span without synchronizing it. @reta lemme know if you want me to add synchronized to undo later, or if you feel you're close to removing synchronized. I kindof lean towards doing synchronized and releasing a patch to fix it..

codefromthecrypt avatar Mar 31 '24 11:03 codefromthecrypt

@codefromthecrypt just picked https://github.com/openzipkin/brave/issues/1426 up, will address this issue as well, thanks a lot for figuring out the cause (and the fix).

reta avatar Mar 31 '24 14:03 reta

I kindof lean towards doing synchronized and releasing a patch to fix it..

I was looking into Tag.tag, particularly this snippet:

   if (span instanceof SpanCustomizer) {
      ((SpanCustomizer) span).tag(key, value);
    } else if (span instanceof MutableSpan) {
      ((MutableSpan) span).tag(key, value);
    }

I think it is rather difficult to fix it it just with synchronized or any kind of locking here: yes, we could synchronize over span instance (I think this is your idea), but the MutableSpan.tag could still be accessed directly and this access will be unguarded. Does it make sense? Thank you.

reta avatar Mar 31 '24 14:03 reta

might have been a red herring.. for example, a user cannot access the mutable span until it is inside a span handler, which should already be locked at that point. sorry if I led you down the wrong path.

Basically we don't want to lock fine grained mutablespan as it will explode overhead in ways maybe our benchmarks aren't written to show. If we locked internally in mutablespan, there are existing code which iterate through all things and update many things. So, better to change nothing until sure.

codefromthecrypt avatar Apr 02 '24 02:04 codefromthecrypt

Thanks @codefromthecrypt

might have been a red herring.. for example, a user cannot access the mutable span until it is inside a span handler, which should already be locked at that point. sorry if I led you down the wrong path.

Aha I see, will look more closely into it (definitely not suggesting we locked internally in mutablespan at the moment, I agree)

reta avatar Apr 02 '24 02:04 reta