opentelemetry-rust
opentelemetry-rust copied to clipboard
[For discussion] Support LogProcessors chaining through mutable reference
Fixes #1650 As discussed here, this PR modifies LogProcessor to take mutable reference to LogData, and further pass it as Owned/Borrowed type to the Exporter.
Existing:
pub trait LogProcessor: Send + Sync + Debug {
// ...
fn emit(&self, data: LogData);
// ...
}
#[async_trait]
pub trait LogExporter: Send + Sync + Debug {
// ...
async fn export(&mut self, batch: Vec<LogData>) -> LogResult<()>;
// ..
This PR:
pub trait LogProcessor: Send + Sync + Debug {
// ...
fn emit(&self, data: &mut LogData);
// ..
}
#[async_trait]
pub trait LogExporter: Send + Sync + Debug {
// ...
async fn export<'a>(&mut self, batch: Vec<Cow<'a, LogData>>) -> LogResult<()>;
// ...
}
- BatchLogProcessor clones the received mutable references of
LogDatabefore enqueueing them and passes this cloned data to the exporter. - SimpleLogProcessor passes received mutable references of
LogDatadirectly to the exporter as borrowed type, without cloning. - Exporters accept both borrowed and owned types of
LogDataand process them accordingly. - These changes:
- Enables passing modified LogData between processors in a chain.
- Allows LogData to be passed from the SDK to the exporter without cloning, such as with
SimpleLogProcessorand for exporters likeopentelemetry-user-events-logsandopentelemetry-etw-logsthat use areentrant log processor.
This also adds certain constraints for implementing custom exporters and log-processors, else their compilation will fail:
Exporters:
The lifetime parameter 'a in Cow<'a, LogData> ensured that exporter do not retain references to the logdata beyond its intended lifespan. If there is a need to modify or store LogData beyond the duration of the export call, the data must be cloned.
Asynchronous Log-Processors:
When a LogProcessor processes LogData asynchronously, such as in the case of BatchLogProcessor, there is a risk that the data may outlive the scope from which it was passed. To prevent any issues, LogData should always be cloned before any asynchronous processing begins.
Synchronous Log-Processors:
For synchronous processing within a LogProcessor, as in SimpleLogProcessor, it is important to ensure that references to LogData are contained within the bounds of the emit function. The LogData should be passed as a borrowed reference if all processing is completed within the emit function.
Changes
Please provide a brief description of the changes here.
Merge requirement checklist
- [ ] CONTRIBUTING guidelines followed
- [ ] Unit tests added/updated (if applicable)
- [ ] Appropriate
CHANGELOG.mdfiles updated for non-trivial, user-facing changes - [ ] Changes in public API reviewed (if applicable)
Also the result of stress test. There is perf improvement as we no longer do clone in log_emitter.rs.
Existing:
Number of threads: 16
Throughput: 36,961,600 iterations/sec
Throughput: 37,929,800 iterations/sec
PR:
Number of threads: 16
Throughput: 64,182,600 iterations/sec
Throughput: 64,921,000 iterations/sec
Throughput: 63,776,800 iterations/sec
will compare and add bench too.
Benchmark
| Test Name | Main | PR |
|---|---|---|
| simple-log/no-context | [128.97 ns] | [89.746 ns] |
| simple-log/with-context time | [130.56 ns] | [89.683 ns] |
| simple-log-with-int/no-context | [171.34 ns] | [106.89 ns] |
| simple-log-with-int/with-context | [ 173.99 ns] | [106.38 ns] |
| simple-log-with-double/no-context | [172.04 ns] | [106.54 ns] |
| simple-log-with-double/with-context | [173.93 ns] | [106.53 ns] |
| simple-log-with-string/no-context | [172.15 ns] | [107.70 ns] |
| simple-log-with-string/with-context | [174.03 ns] | [106.98 ns] |
| simple-log-with-bool/no-context | [174.00 ns] | [111.34 ns] |
| simple-log-with-bool/with-context | [178.27 ns] | [109.40 ns] |
| simple-log-with-bytes/no-context | [209.17 ns] | [131.57 ns] |
| simple-log-with-bytes/with-context | [211.31 ns] | [131.33 ns] |
| simple-log-with-a-lot-of-bytes/no-context | [216.39 ns] | [126.85 ns] |
| simple-log-with-a-lot-of-bytes/with-context | [218.21 ns] | [124.83 ns] |
| simple-log-with-vec-any-value/no-context | [285.70 ns] | [165.38 ns] |
| simple-log-with-vec-any-value/with-context | [282.09 ns] | [169.18 ns] |
| simple-log-with-inner-vec-any-value/no-context | [393.67 ns] | [223.43 ns] |
| simple-log-with-inner-vec-any-value/with-context | [391.62 ns] | [224.96 ns] |
| simple-log-with-map-any-value/no-context | [328.11 ns] | [177.65 ns] |
| simple-log-with-map-any-value/with-context | [329.72 ns] | [176.94 ns] |
| simple-log-with-inner-map-any-value/no-context | [482.51 ns] | [257.19 ns] |
| simple-log-with-inner-map-any-value/with-context | [484.81 ns] | [252.65 ns] |
| long-log/no-context | [128.64 ns] | [90.043 ns] |
| long-log/with-context | [129.22 ns] | [89.758 ns] |
| full-log/no-context | [140.29 ns] | [98.945 ns] |
| full-log/with-context | [139.50 ns] | [99.310 ns] |
| full-log-with-4-attributes/no-context | [311.59 ns] | [196.60 ns] |
| full-log-with-4-attributes/with-context | [299.92 ns] | [190.10 ns] |
| full-log-with-9-attributes/no-context | [517.14 ns] | [365.58 ns] |
| full-log-with-9-attributes/with-context | [527.35 ns] | [367.39 ns] |
| full-log-with-attributes/no-context | [421.30 ns] | [248.99 ns] |
| full-log-with-attributes/with-context | [426.44 ns] | [239.60 ns] |
Thanks for looking into this! The perf win is amazing!
Would the below statements be accurate?
- LogProcessors now get a mutable ref to LogData. If they want to make changes to LogData, they are allowed to do so, but they need to be aware that it'll be reflected in other processors registered after them.
- If a LogProcessor is an Exporting processor, and it has to batch LogData, then it must clone the data. Due to cloning, any changes made to the original one after the processor is invoked won't be visible to this LogProcessor.
- Currently the clone cost is paid by the SDK itself. But this PR makes it possible to write pipeline without any cloning (simple/re-entrant processors who export sync.). BatchingProcessors need to clone, but that is no worse than current.
- The most common use case of 1 or more enrichment/redactor processor, followed by BatchProcessor would be much faster with this change. Infact this was not even feasible before as enrichment/redactions got applied to the clone only
- The high-perf use case of 1 or more enrichment/redactor processor, following by Re-entrantProcessors would be much much faster now as we completely eliminate cloning.
Robert's suggestion of offering the ability to run independent pipelines (one pipeline with redaction processor exporting redacted data, and another pipeline exporting without redaction is possible with this change also, but it requires cloning to ensure the changes are only affecting that pipeline only)
Would the below statements be accurate?
Nicely summarised the changes :) Yes to all.
Codecov Report
Attention: Patch coverage is 79.19463% with 31 lines in your changes are missing coverage. Please review.
Project coverage is 73.6%. Comparing base (
33abef2) to head (af99199).
Additional details and impacted files
@@ Coverage Diff @@
## main #1726 +/- ##
======================================
Coverage 73.6% 73.6%
======================================
Files 123 123
Lines 19500 19605 +105
======================================
+ Hits 14362 14442 +80
- Misses 5138 5163 +25
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Have added unit-test to validate LogProcessor chaining and updating of LogData. This is ready for review now. Also added changelog and benchmark in the description.
Please update the stress test numbers with this change, so it is easy to track.
Also https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-appender-tracing/benches/logs.rs#L13
Done, also updated the results in the description.