datasketches icon indicating copy to clipboard operation
datasketches copied to clipboard

Feature Request: Add Approximate Mode/Frequent Items Support Using DataSketches’ FrequentItemsSketch

Open chitralverma opened this issue 7 months ago • 1 comments

Hi team

I’m using the excellent DuckDB datasketches extension for large-scale analytics use cases. One common requirement in our datasets is to compute the mode() (most frequent item) per group, but the built-in exact mode() function in DuckDB leads to high memory usage or even OOMs when applied on large, high-cardinality datasets.

Feature Request Please consider adding support for approximate mode estimation using FrequentItemsSketch from Apache DataSketches.

Why is this useful?

  • mode() is commonly needed in aggregations over grouped data, e.g.:
    SELECT x, y, mode(z) FROM table GROUP BY x, y;
    
  • On large datasets (e.g., 30M+ rows, 1K+ groups), the exact mode() leads to memory exhaustion.
  • Approximate mode with bounded error would be a great tradeoff and fits well into the sketch philosophy.

References

chitralverma avatar Jul 04 '25 06:07 chitralverma

Hi @chitralverma,

Thanks for the thoughtful feature request and for your kind words about the DuckDB datasketches extension — we’re glad to hear it’s proving useful for your large-scale analytics workflows.

We agree that approximate mode estimation via FrequentItemsSketch would be a valuable addition, especially for high-cardinality use cases where exact mode() is impractical. That said, we want to be transparent that this extension is just one part of a larger roadmap, and at the moment, this particular feature is not at the top of our current priorities.

However, if this functionality is urgent for your team or organization, we do offer paid consulting and development services. This helps us prioritize specific features like this one and accelerates their delivery. If you’re interested in exploring that option, feel free to reach out to us at [email protected].

Thanks again for engaging with the project — and we hope to continue improving it with input like yours.

Best, The Query.Farm Team

rustyconover avatar Jul 04 '25 15:07 rustyconover