opentelemetry-rust icon indicating copy to clipboard operation
opentelemetry-rust copied to clipboard

Tracing SDK - Processors get clone of SpanData which is incorrect

Open cijothomas opened this issue 10 months ago • 4 comments

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-sdk/src/trace/span.rs#L232-L233

Each processor gets a clone

  1. This violates the spec as changes done by one Processor is not reflected in the next.
  2. Performance issues due to clone.

Log SDK has changed the design to pass &mut instead which is performant and is spec-compliant. Cloning can be done for any processor (like BatchProcessor), if it want to store it for batching purposes.

cijothomas avatar Feb 27 '25 18:02 cijothomas

@TommyCpp Opened this as discussed in the community call.

cijothomas avatar Feb 27 '25 18:02 cijothomas

Actually to me the current API fits the spec well. The spec says Even if the passed Span may be technically writable, since it's already ended at this point, modifying it is not allowed. link. So changes done by one Processor should probably not be reflected in the next Owned data is technically "mutable", but the mutation only affects the copy you are holding which is in the spirit of the spec IMO.

Passing an &mut to on_end creates two new issues:

  • From just the API it's not obvious that mutable should not be called. A badly written SpanProcessor could modify the data and potentially cause non-deterministic behaviours depending on the order in which SpanProcessors are called
  • Passing a reference instead of owned data means that at least one extra copy is still needed to pass the data to the exporter. In particular, the BatchExporter needs to own the SpanData because of it's asynchronicity. This extra copy currently doesn't happen if you only have a single SpanProcessor https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-sdk/src/trace/span.rs#L224

I think we could design better api could be designed because there are two very different use cases (which are not really handled by the spec, since it mostly considers languages with a GC):

  • SpanProcessor that need to read the span data in on_end (or do nothing...)
  • SpanProcessor that need to take ownership of the span data in on_end

It doesn't make sense to give a cloned value to SpanProcessor that only need to just read, and a reference that SpanProcessor that need to take ownership if no other SpanProcessor accesses the data latter...

So I'm thinking of something like this:

trait SpanProcessor {
  /// FinishedSpan is passed mutably, but the span data itself cannot mutated because the field is private
  fn on_end(&self, span: &mut FinishedSpan);
}

pub struct FinishedSpan {
  data: Option<SpanData>,
  is_last_processor: bool,
  is_data_consumed: bool,
}

impl FinishedSpan {
  /// returns an read only reference to the span data
  fn read(&self) -> &SpanData {
    if self.is_data_consumed { panic!("") }
    return &self.data.unwrap();
  }

  /// returns an owned copy of the data
  /// This function will panic if it is called twice within the same SpanProcessor::on_end function
  /// If it is called by the last span processor in the list it will not allocate new data 
  fn consume(&mut self) -> SpanData {
    if self.is_data_consumed { panic!("") }
    self.data_consumed = true;
    if self.is_last_processor {
      self.data.take()
    } else {
      self.data.unwrap().clone()
    }
  }
}

// Then there are two cases:

/// Read only on_end span processor
impl SpanProcessor for SpanEndLogger {
  fn on_end(&self, span: &mut FinishedSpan) {
    // does not allocate if there are mutliple processors
    println!("{:?}", span.read().span_context.trace_id());
  } 
}

/// Read only on_end span processor
impl SpanProcessor for SimpleProcessor {
  fn on_end(&self, span: &mut FinishedSpan) {
    // does not allocate if this is the only processor left
    self.exporter.export(span.consume());
  } 
}

paullegranddc avatar Apr 22 '25 18:04 paullegranddc

pub struct FinishedSpan {
  data: Option<SpanData>,
  is_last_processor: bool,
  is_data_consumed: bool,
}

Can we use a CoW pointer here? Seems the concept is simliar enough(don't clone/own until need to consume it)

TommyCpp avatar Apr 29 '25 03:04 TommyCpp

Can we use a CoW pointer here? Seems the concept is simliar enough(don't clone/own until need to consume it)

I don't think it's possible because I think we want the opposite from what Cow is useful for. Cow is used to give data to a consumer "you don't have to care wether it's owned/borrowed"

Whereas here we want to always give owned data, but to be able to reuse the value if it's not actually consumed.

paullegranddc avatar Apr 29 '25 14:04 paullegranddc