gofr Increase SQL Query Histogram Range Beyond 10ms for Latency Tracking

The current 10 ms upper bound on GoFr’s SQL query histogram quantiles is great for ultra-fast query visibility (for apps serving millions of requests per minute), but it compresses all slower queries into the same top bucket, which makes real-world latency analysis difficult — especially in production workloads where 100 ms–5 s queries are normal for analytical or transactional DBs.

⚙️ Why 10 ms was used initially

It was designed to detect micro-latency regressions in high-throughput services (like internal GoFr benchmarks).
Early use cases focused on microservices hitting in-memory or well-indexed SQL queries where anything >10 ms was an anomaly.
It optimized for Prometheus cardinality — fewer histogram buckets = smaller metrics footprint.

❌ Why that’s limiting in real workloads

Queries taking 20 ms, 200 ms, or 2 s all fall into the same bucket — you lose resolution.
Makes it impossible to differentiate between “slightly slow” and “critical” queries.
You can’t correlate query latency with API request latency effectively.
Makes alerting thresholds (like P95 or P99 query latency) misleadingly small.

✅ Recommended Fix

Adopt a wider histogram range with logarithmic or percentile-style bucket spacing.

Example ideal bucket set for SQL histograms:

[]float64{
    0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 20, 30, 60,
}

This covers:

Microsecond-level detail for fast queries
Smooth spread for mid-range queries
Full visibility up to 60 seconds for extreme cases

🧩 Dynamic Adjustment (Could be an option)

Instead of hardcoding buckets, expose them via configuration — for example:

app.Metrics.SetHistogramBuckets("sql_query_duration_seconds", []float64{
    0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 30, 60,
})

You could tie this to:

Datasource type (MySQL, Postgres, BigQuery, etc.)
Request timeout (e.g., if REQUEST_TIMEOUT=30s, upper bound could be 30 s)
Environment (dev → short, prod → wide)

📊 Impact

Slightly higher Prometheus storage footprint (more buckets)
Far more actionable insights:
- 95th percentile query time per DB
- Breakdown of fast vs slow queries
- Easier detection of query regressions after deployments

Oct 27 '25 15:10 aryanmehrotra

Hi @aryanmehrotra, I’d love to fix to this issue! I have some experience with Go and performance instrumentation, and I’d like to help implement the wider SQL query histogram range as described. Could you please assign this issue to me or guide me on how to proceed?

Nov 11 '25 16:11 nitin-999-code

This issue is being tracked in PR #2526

Nov 13 '25 08:11 Umang01-hash