datasketches-cpp icon indicating copy to clipboard operation
datasketches-cpp copied to clipboard

Proposal: Add DDSketch (Relative-Error Quantile Sketch)

Open geonove opened this issue 7 months ago • 6 comments

Proposal: Add DDSketch (Relative-Error Quantile Sketch)

Summary:
This issue proposes adding an implementation of DDSketch, a mergeable quantile sketch with relative-error guarantees, to the datasketches-cpp library.

Benefits:

  • Relative-error guarantees
  • Mergeability for distributed processing
  • Predictable memory usage
  • Used in production (Datadog, OpenTelemetry)

References

Proposed Design

  • New class under ddsketch.hpp
  • Logarithmic mapping of input values to buckets using configurable relative accuracy
  • Compact, bounded memory footprint with optional bucket collapsing
  • Mergeable histogram-style structure
  • Serialization and deserialization support
  • Unit tests and benchmarks included

Compatibility

  • No changes to existing APIs
  • Implementation will be self-contained
  • Optional: initial release could be marked experimental

Next Steps

If there is community interest, I’m happy to:

  1. Share a detailed design document
  2. Begin work on the implementation and submit a PR
  3. Iterate based on feedback

Would the maintainers be open to including DDSketch? Are there specific design or compatibility considerations I should address before proceeding?

geonove avatar May 29 '25 12:05 geonove

Current quantile sketches in datasketches-cpp (KLL, REQ, t-digest) offer absolute-error guarantees

REQ sketch offers relative error guarantee.

AlexanderSaydakov avatar May 29 '25 19:05 AlexanderSaydakov

t-digest also prioritizes high and low rank accuracy see https://github.com/apache/datasketches-cpp/issues/416

AlexanderSaydakov avatar May 29 '25 19:05 AlexanderSaydakov

I also found this discussion some years ago: https://github.com/apache/datasketches-java/issues/357

AlexanderSaydakov avatar May 29 '25 19:05 AlexanderSaydakov

Thank you for the responses and the helpful references to prior discussions @AlexanderSaydakov — this has been very informative. I’m still interested in exploring a DDSketch implementation, even if just as an experimental project.

Based on the discussion you mentioned, I’ll reflect deeply on whether (and how) it might be possible to make somewhat fair comparisons between sketches with different goals and guarantees.

Please let me know if there could still be interest in this kind of work, or if there’s openness to reopening the discussion at some point down the line.

geonove avatar May 29 '25 21:05 geonove

Of course, any new sketches are welcome if they have solid research behind them and have substantially different trade-offs compared to ones already implemented in the library. As I understand, closest competitors to DDSketch are REQ sketch and t-digest. Do you think that DDSketch can be a better choice for some applications?

AlexanderSaydakov avatar May 29 '25 21:05 AlexanderSaydakov

I believe it offers accuracy guarantees that t-digest cannot, while providing dual-tailed relative error. It’s numeric so it can do things REQ can’t (two-sided) and is likely smaller, too.

I think it’s a reasonable sketch to include. But we’d want to ensure the API uses similar naming as our existing sketches.

jmalkin avatar May 30 '25 00:05 jmalkin