Proposal: Add DDSketch (Relative-Error Quantile Sketch)
Proposal: Add DDSketch (Relative-Error Quantile Sketch)
Summary:
This issue proposes adding an implementation of DDSketch, a mergeable quantile sketch with relative-error guarantees, to the datasketches-cpp library.
Benefits:
- Relative-error guarantees
- Mergeability for distributed processing
- Predictable memory usage
- Used in production (Datadog, OpenTelemetry)
References
- VLDB 2019: DDSketch Paper
- Datadog's sketches-java repo
Proposed Design
- New class under
ddsketch.hpp - Logarithmic mapping of input values to buckets using configurable relative accuracy
- Compact, bounded memory footprint with optional bucket collapsing
- Mergeable histogram-style structure
- Serialization and deserialization support
- Unit tests and benchmarks included
Compatibility
- No changes to existing APIs
- Implementation will be self-contained
- Optional: initial release could be marked experimental
Next Steps
If there is community interest, I’m happy to:
- Share a detailed design document
- Begin work on the implementation and submit a PR
- Iterate based on feedback
Would the maintainers be open to including DDSketch? Are there specific design or compatibility considerations I should address before proceeding?
Current quantile sketches in datasketches-cpp (KLL, REQ, t-digest) offer absolute-error guarantees
REQ sketch offers relative error guarantee.
t-digest also prioritizes high and low rank accuracy see https://github.com/apache/datasketches-cpp/issues/416
I also found this discussion some years ago: https://github.com/apache/datasketches-java/issues/357
Thank you for the responses and the helpful references to prior discussions @AlexanderSaydakov — this has been very informative. I’m still interested in exploring a DDSketch implementation, even if just as an experimental project.
Based on the discussion you mentioned, I’ll reflect deeply on whether (and how) it might be possible to make somewhat fair comparisons between sketches with different goals and guarantees.
Please let me know if there could still be interest in this kind of work, or if there’s openness to reopening the discussion at some point down the line.
Of course, any new sketches are welcome if they have solid research behind them and have substantially different trade-offs compared to ones already implemented in the library. As I understand, closest competitors to DDSketch are REQ sketch and t-digest. Do you think that DDSketch can be a better choice for some applications?
I believe it offers accuracy guarantees that t-digest cannot, while providing dual-tailed relative error. It’s numeric so it can do things REQ can’t (two-sided) and is likely smaller, too.
I think it’s a reasonable sketch to include. But we’d want to ensure the API uses similar naming as our existing sketches.