logfire icon indicating copy to clipboard operation
logfire copied to clipboard

`WITHIN GROUP` query syntax

Open samuelcolvin opened this issue 1 year ago • 3 comments
trafficstars

@@frankie567:

I just tried with this new database engine and it's clearly day-and-night in terms of performance. We struggled before to load our dashboards for timespans > 3 hours; now it's able to load 30 days span pretty quickly. Really great move 👏

I've an existing query that's no longer compatible. The goal was to get request durations percentiles. It looks like this:

WITH duration AS (
  SELECT
    time_bucket('%time_bucket_duration%', start_timestamp) AS x,
    extract(milliseconds from end_timestamp - start_timestamp) as duration
  FROM records
  WHERE otel_scope_name = 'opentelemetry.instrumentation.asgi' and parent_span_id is null
)
SELECT
  x,
  percentile_cont(0.99) WITHIN GROUP (ORDER BY duration) as percentile_99,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY duration) as percentile_95
FROM duration
GROUP BY x
ORDER BY x DESC;

But DataFusion states it doesn't support WITHIN GROUP:

Capture d’écran 2024-09-18 à 17 19 35

I'm pretty sure there is another way to achieve that query but don't have time right now to explore this ☺️

Originally posted by @frankie567 in https://github.com/pydantic/logfire/issues/408#issuecomment-2358768322

samuelcolvin avatar Sep 18 '24 17:09 samuelcolvin

Thanks @frankie567 we'll look into this, I think @alexmojaki has something similar working until we can support that syntax in datafusion.

samuelcolvin avatar Sep 18 '24 17:09 samuelcolvin

I've created https://github.com/apache/datafusion/issues/12533, and commented on https://github.com/apache/datafusion/issues/11732. Once I get some guidance on next steps, I'll try to work on it.

samuelcolvin avatar Sep 19 '24 10:09 samuelcolvin

Same here I have a query using WITHIN thet fails, here getting slowest SQL ordering by P90


WITH query_durations AS (
    SELECT 
        attributes->>'db.statement' as query, 
        EXTRACT(EPOCH FROM (end_timestamp - start_timestamp)) * 1000 AS duration_ms 
    FROM 
        records
    WHERE 
        attributes->>'db.system' = 'influxdb'
        AND service_name = 'server'
),
p90_durations AS (
    SELECT 
        query, 
        percentile_cont(0.90) WITHIN GROUP (ORDER BY duration_ms) AS p90_duration_ms
    FROM 
        query_durations
    GROUP BY 
        query
)
SELECT 
    query, 
    p90_duration_ms
FROM 
    p90_durations
ORDER BY 
    p90_duration_ms DESC
LIMIT 10;

jules-ch avatar Sep 26 '24 23:09 jules-ch

I've updated the query in https://logfire.pydantic.dev/docs/integrations/use-cases/web-frameworks/#query-http-requests-duration-per-percentile:

WITH dataset AS (
  SELECT
    time_bucket('%time_bucket_duration%', start_timestamp) AS x,
    (extract(ms from end_timestamp - start_timestamp)) as duration_ms
  FROM records
  WHERE attributes ? 'http.method'
)
SELECT
  x,
  approx_percentile_cont(duration_ms, 0.50) as percentile_50,
  approx_percentile_cont(duration_ms, 0.90) as percentile_90,
  approx_percentile_cont(duration_ms, 0.95) as percentile_95,
  approx_percentile_cont(duration_ms, 0.99) as percentile_99
FROM dataset
GROUP BY x
ORDER BY x

This works with the new database and doesn't need WITHIN GROUP at all.

alexmojaki avatar Oct 14 '24 13:10 alexmojaki

Yeah I managed to find a workaround with datafusion fucntions

jules-ch avatar Oct 15 '24 07:10 jules-ch

Works perfectly well, thank you @alexmojaki 🙏

frankie567 avatar Oct 15 '24 08:10 frankie567

OK, closing this because I think the main problem was the docs recommending a query that didn't work with datafusion. With time this syntax may be supported but it's not really a logfire issue.

alexmojaki avatar Oct 15 '24 09:10 alexmojaki