influxdb Performance difference between Flux and InfluxQL

Hello,

I've recently been doing some tests with influx 1.8 and noticed that Flux queries are considerably slower than their InfluxQL counterparts.

My dataset simulates one month of data every second. The tests consist of 3 main cases:

Getting the last 7 days of raw data
Getting the last 7 days average with an interval of 5 minutes
Getting the last 30 days average with an interval of 5 minutes

InfluxQL

SELECT value FROM "db"."autogen"."raw.int64" WHERE time > now() - 7d AND time < now() AND "key"='test-key1' AND "acp_name"='test-acp' AND "source"='test-source' FILL(null)

Getting the last 7 days of raw data takes on average 1.8s

SELECT mean("value") as "mean_value" FROM "db"."autogen"."raw.int64" WHERE time > now() - 7d AND time < now() AND "key"='test-key1' AND "acp_name"='test-acp' AND "source"='test-source' GROUP BY time(5m) FILL(null)

Getting the average of the last 7 days with an interval of 5 minutes takes on average 46.3ms

SELECT mean("value") as "mean_value" FROM "db"."autogen"."raw.int64" WHERE time > now() - 30d AND time < now() AND "key"='test-key1' AND "acp_name"='test-acp' AND "source"='test-source' GROUP BY time(5m) FILL(null)

Getting the average of the last 30 days with an interval of 5 minutes takes on average 201.3ms

Flux

from(bucket: "bucket") 
    |> range(start: -7d) 
    |> filter(fn: (r) => r._measurement == "raw.int64" and r._field == "value")
    |> filter(fn: (r) => r.acp_name == "test-acp" and r.key == "test-key1" and r.source == "source") 
    |> keep(columns: ["_time", "_value"])

Getting the last 7 days of raw data takes on average 26.7ms

from(bucket: "bucket") 
    |> range(start: -7d) 
    |> filter(fn: (r) => r._measurement == "raw.int64" and r._field == "value")
    |> filter(fn: (r) => r.acp_name == "test-acp" and r.key == "test-key1" and r.source == "source") 
    |> keep(columns: ["_time", "_value"])
    |> aggregateWindow(every: 5m, fn: mean)

Getting the average of the last 7 days with an interval of 5 minutes takes on average 1.3s

from(bucket: "bucket") 
    |> range(start: -30d) 
    |> filter(fn: (r) => r._measurement == "raw.int64" and r._field == "value")
    |> filter(fn: (r) => r.acp_name == "test-acp" and r.key == "test-key1" and r.source == "source") 
    |> keep(columns: ["_time", "_value"])
    |> aggregateWindow(every: 5m, fn: mean)

Getting the average of the last 30 days with an interval of 5 minutes takes on average 6.2s

The only case where Flux is faster is when getting raw data. If I remove the keep statement the 30d query execution time jumps up to close to 12s. Also when running the Influx v2 beta these queries are marginally slower. I understand the interval is small, but still, the performance difference is huge.

I'd like to use some features present in the Influx v2 Go client however (e.g. CSV responses), however the performance penalty is too big to ignore.

Best Regards

May 13 '20 18:05 Surtr04

@Surtr04 Thanks for the thorough report.

We are currently working on some optimizations to make the type of Flux query you are using significantly faster. We are specifically improving the keep implementation and also working to improve aggregateWindow to push to processing of the data closer to where the data lives.

We will keep an eye on this issue and let you know when you can expect some improvements.

Thanks again for taking the time to write up your findings.

May 28 '20 20:05 nathanielc

Hi, do you have a rough timeplan when the performance significantly improves for influxdb 1.8? This prevents me also from using that database. For my use case I need to use the flux language because I want to calculate the mean over different measurements. Thx

Sep 14 '20 17:09 thaisthaisen

Has this been solved? I believe to see as well significant differences in performance between Flux and InfluxQL with InfluxDB 1.8.0

Oct 28 '20 11:10 wolfig

InfluxDB shell version: 1.8.4

I'm seeing 10x difference between InfluxQL and Flux. 0m0.070s vs 0m0.841s

Starting point:

SELECT mean(tank4_percent_wo), min(tank4_percent_wo), max(tank4_percent_wo) from vessmon_data where time>1618217877s and time<1618304277s group by time(5m) limit 10

Example:

time curl "http://127.0.0.1:28086/api/v2/query" \
  -H "Accept: application/csv" \
  -H "Content-type: application/vnd.flux" \
  -d '
data = from(bucket:"vessmon/autogen")
|> range(start: 2021-04-12T11:57:57+03:00, stop: 2021-04-13T11:57:57+03:00)
|> filter(fn:(r) => r._measurement == "vessmon_data" and r._field =="tank4_percent_wo")
|> window(every: 5m)

mean_d = data |> mean() |> set(key: "_field", value: "mean")
min_d = data |> min() |> set(key: "_field", value: "min")
max_d = data |> max() |> set(key: "_field", value: "max")

union(tables: [mean_d, min_d, max_d])
|> pivot(rowKey:["_start"], columnKey: ["_field"], valueColumn: "_value")
|> duplicate(column: "_stop", as: "_time")
|> window(every: inf)
|> limit(n:10, offset: 0)
  '


time curl -G 'http://127.0.0.1:28086/query?pretty=true' --data-urlencode "db=vessmon" --data-urlencode "q=SELECT mean(tank4_percent_wo), min(tank4_percent_wo), max(tank4_percent_wo) from vessmon_data where time>1618217877s and time<1618304277s group by time(5m)  limit 10"

Result:

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,string,double,double,double,dateTime:RFC3339
#group,false,false,true,true,true,false,false,false,false
#default,_result,,,,,,,,
,result,table,_start,_stop,_measurement,max,mean,min,_time
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,151,150.46341463414635,149,2021-04-12T09:00:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,127,125,123,2021-04-12T09:05:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,132,130.5,129,2021-04-12T09:10:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,137,135,133,2021-04-12T09:15:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,142,140,138,2021-04-12T09:20:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,147,145,143,2021-04-12T09:25:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,152,150,148,2021-04-12T09:30:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,153,130.2,123,2021-04-12T09:35:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,131,129.25,127,2021-04-12T09:40:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,136,134,132,2021-04-12T09:45:00Z


real    0m0.841s
user    0m0.008s
sys     0m0.000s
{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "vessmon_data",
                    "columns": [
                        "time",
                        "mean",
                        "min",
                        "max"
                    ],
                    "values": [
                        [
                            "2021-04-12T08:55:00Z",
                            150.47540983606558,
                            149,
                            151
                        ],
                        [
                            "2021-04-12T09:00:00Z",
                            125,
                            123,
                            127
                        ],
                        [
                            "2021-04-12T09:05:00Z",
                            130.5,
                            129,
                            132
                        ],
                        [
                            "2021-04-12T09:10:00Z",
                            135,
                            133,
                            137
                        ],
                        [
                            "2021-04-12T09:15:00Z",
                            140,
                            138,
                            142
                        ],
                        [
                            "2021-04-12T09:20:00Z",
                            145,
                            143,
                            147
                        ],
                        [
                            "2021-04-12T09:25:00Z",
                            150,
                            148,
                            152
                        ],
                        [
                            "2021-04-12T09:30:00Z",
                            130.2,
                            123,
                            153
                        ],
                        [
                            "2021-04-12T09:35:00Z",
                            129.25,
                            127,
                            131
                        ],
                        [
                            "2021-04-12T09:40:00Z",
                            134,
                            132,
                            136
                        ]
                    ]
                }
            ]
        }
    ]
}

real    0m0.070s
user    0m0.003s
sys     0m0.008s

Tried same queries on same data upgraded to v2.0 (different machine).

influxQL: 0m0.024s flux: 0m0.265s

Queries:


time curl "http://127.0.0.1:8086/api/v2/query?org=vessmon" \
  -H "Authorization: Token token" \
  -H "Accept: application/csv" \
  -H "Content-type: application/vnd.flux" \
  -d '
data = from(bucket:"vessmon/autogen")
|> range(start: 2021-04-12T11:57:57+03:00, stop: 2021-04-13T11:57:57+03:00)
|> filter(fn:(r) => r._measurement == "vessmon_data" and r._field =="tank4_percent_wo")
|> window(every: 5m)

mean_d = data |> mean() |> set(key: "_field", value: "mean")
min_d = data |> min() |> set(key: "_field", value: "min")
max_d = data |> max() |> set(key: "_field", value: "max")

union(tables: [mean_d, min_d, max_d])
|> pivot(rowKey:["_start"], columnKey: ["_field"], valueColumn: "_value")
|> duplicate(column: "_stop", as: "_time")
|> window(every: inf)
|> limit(n:10, offset: 0)
  '


time curl --get "http://127.0.0.1:8086/query?db=vessmon" \
  --header "Authorization: Token token" \
  --data-urlencode "q=SELECT mean(tank4_percent_wo), min(tank4_percent_wo), max(tank4_percent_wo) from vessmon.autogen.vessmon_data where time>1618217877s and time<1618304277s group by time(5m) limit 10"

Result:

,result,table,_start,_stop,_measurement,max,mean,min,_time
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,151,150.46341463414635,149,2021-04-12T09:00:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,127,125,123,2021-04-12T09:05:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,132,130.5,129,2021-04-12T09:10:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,137,135,133,2021-04-12T09:15:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,142,140,138,2021-04-12T09:20:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,147,145,143,2021-04-12T09:25:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,152,150,148,2021-04-12T09:30:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,153,130.2,123,2021-04-12T09:35:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,131,129.25,127,2021-04-12T09:40:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,136,134,132,2021-04-12T09:45:00Z


real    0m0.265s
user    0m0.005s
sys     0m0.000s
{"results":[{"statement_id":0,"series":[{"name":"vessmon_data","columns":["time","mean","min","max"],"values":[["2021-04-12T08:55:00Z",150.47540983606558,149,151],["2021-04-12T09:00:00Z",125,123,127],["2021-04-12T09:05:00Z",130.5,129,132],["2021-04-12T09:10:00Z",135,133,137],["2021-04-12T09:15:00Z",140,138,142],["2021-04-12T09:20:00Z",145,143,147],["2021-04-12T09:25:00Z",150,148,152],["2021-04-12T09:30:00Z",130.2,123,153],["2021-04-12T09:35:00Z",129.25,127,131],["2021-04-12T09:40:00Z",134,132,136]]}]}]}

real    0m0.024s
user    0m0.000s
sys     0m0.005s

Apr 13 '21 10:04 aldas

So I have 2 instances of InfluxDB (1.8.4 and 2.0.4) running in same VPS. Queries are same:

Even for v2.0 Flux is ~10x slower than InfluxQL - is this normal?

Flux and InfluxQL queries against 1.8.4:

x@x-hel:~$ ./test1_8.sh 
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,string,double,double,double,dateTime:RFC3339
#group,false,false,true,true,true,false,false,false,false
#default,_result,,,,,,,,
,result,table,_start,_stop,_measurement,max,mean,min,_time
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,151,150.46341463414635,149,2021-04-12T09:00:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,127,125,123,2021-04-12T09:05:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,132,130.5,129,2021-04-12T09:10:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,137,135,133,2021-04-12T09:15:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,142,140,138,2021-04-12T09:20:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,147,145,143,2021-04-12T09:25:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,152,150,148,2021-04-12T09:30:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,153,130.2,123,2021-04-12T09:35:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,131,129.25,127,2021-04-12T09:40:00Z
,,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,136,134,132,2021-04-12T09:45:00Z


real    0m0.848s
user    0m0.011s
sys     0m0.004s
{"results":[{"statement_id":0,"series":[{"name":"vessmon_data","columns":["time","mean","min","max"],"values":[["2021-04-12T08:55:00Z",150.47540983606558,149,151],["2021-04-12T09:00:00Z",125,123,127],["2021-04-12T09:05:00Z",130.5,129,132],["2021-04-12T09:10:00Z",135,133,137],["2021-04-12T09:15:00Z",140,138,142],["2021-04-12T09:20:00Z",145,143,147],["2021-04-12T09:25:00Z",150,148,152],["2021-04-12T09:30:00Z",130.2,123,153],["2021-04-12T09:35:00Z",129.25,127,131],["2021-04-12T09:40:00Z",134,132,136]]}]}]}

real    0m0.066s
user    0m0.006s
sys     0m0.007s

Flux and InfluxQL queries against 2.0.4:

x@x-hel:~$ ./test2_0.sh 
,result,table,_start,_stop,_measurement,max,mean,min,_time
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,151,150.46341463414635,149,2021-04-12T09:00:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,127,125,123,2021-04-12T09:05:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,132,130.5,129,2021-04-12T09:10:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,137,135,133,2021-04-12T09:15:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,142,140,138,2021-04-12T09:20:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,147,145,143,2021-04-12T09:25:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,152,150,148,2021-04-12T09:30:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,153,130.2,123,2021-04-12T09:35:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,131,129.25,127,2021-04-12T09:40:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,136,134,132,2021-04-12T09:45:00Z


real    0m0.724s
user    0m0.010s
sys     0m0.004s
{"results":[{"statement_id":0,"series":[{"name":"vessmon_data","columns":["time","mean","min","max"],"values":[["2021-04-12T08:55:00Z",150.47540983606558,149,151],["2021-04-12T09:00:00Z",125,123,127],["2021-04-12T09:05:00Z",130.5,129,132],["2021-04-12T09:10:00Z",135,133,137],["2021-04-12T09:15:00Z",140,138,142],["2021-04-12T09:20:00Z",145,143,147],["2021-04-12T09:25:00Z",150,148,152],["2021-04-12T09:30:00Z",130.2,123,153],["2021-04-12T09:35:00Z",129.25,127,131],["2021-04-12T09:40:00Z",134,132,136]]}]}]}

real    0m0.087s
user    0m0.004s
sys     0m0.008s

Apr 13 '21 20:04 aldas

So in 2.0.7 performance between InfluxQL and Flux has improved but it is still 2x slower for first query and repeating same query is much faster for InfluQL probably due some caching or something

x@x:~/$ docker exec -it db2_influxdb_1 influx version
Influx CLI 2.0.7 (git: 2a45f0c037) build_date: 2021-06-04T19:17:40Z

InfluxQL:

x@x:~/$ time curl --get "http://127.0.0.1:48086/query?db=vessmon" \
>   --header "Authorization: Token token" \
>   --data-urlencode "q=SELECT mean(tank4_percent_wo), min(tank4_percent_wo), max(tank4_percent_wo) from vessmon.autogen.vessmon_data where time>1618217877s and time<1618304277s group by time(5m) limit 10"
{"results":[{"statement_id":0,"series":[{"name":"vessmon_data","columns":["time","mean","min","max"],"values":[["2021-04-12T08:55:00Z",150.47540983606558,149,151],["2021-04-12T09:00:00Z",125,123,127],["2021-04-12T09:05:00Z",130.5,129,132],["2021-04-12T09:10:00Z",135,133,137],["2021-04-12T09:15:00Z",140,138,142],["2021-04-12T09:20:00Z",145,143,147],["2021-04-12T09:25:00Z",150,148,152],["2021-04-12T09:30:00Z",130.2,123,153],["2021-04-12T09:35:00Z",129.25,127,131],["2021-04-12T09:40:00Z",134,132,136]]}]}]}

real    0m0.363s
user    0m0.006s
sys     0m0.025s

Flux

x@x:~/$ time curl "http://127.0.0.1:48086/api/v2/query?org=local" \
>   -H "Authorization: Token token" \
>   -H "Accept: application/csv" \
>   -H "Content-type: application/vnd.flux" \
>   -d '
> data = from(bucket:"vessmon/autogen")
> |> range(start: 2021-04-12T11:57:57+03:00, stop: 2021-04-13T11:57:57+03:00)
> |> filter(fn:(r) => r._measurement == "vessmon_data" and r._field =="tank4_percent_wo")
> |> window(every: 5m)
> 
> mean_d = data |> mean() |> set(key: "_field", value: "mean")
> min_d = data |> min() |> set(key: "_field", value: "min")
> max_d = data |> max() |> set(key: "_field", value: "max")
> 
> union(tables: [mean_d, min_d, max_d])
> |> pivot(rowKey:["_start"], columnKey: ["_field"], valueColumn: "_value")
> |> duplicate(column: "_stop", as: "_time")
> |> window(every: inf)
> |> limit(n:10, offset: 0)
>   '
,result,table,_start,_stop,_measurement,max,mean,min,_time
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,151,150.46341463414635,149,2021-04-12T09:00:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,127,125,123,2021-04-12T09:05:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,132,130.5,129,2021-04-12T09:10:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,137,135,133,2021-04-12T09:15:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,142,140,138,2021-04-12T09:20:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,147,145,143,2021-04-12T09:25:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,152,150,148,2021-04-12T09:30:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,153,130.2,123,2021-04-12T09:35:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,131,129.25,127,2021-04-12T09:40:00Z
,_result,0,2021-04-12T08:57:57Z,2021-04-13T08:57:57Z,vessmon_data,136,134,132,2021-04-12T09:45:00Z


real    0m0.651s
user    0m0.007s
sys     0m0.013s

Note: if I try those queries multiple times it seems that InfluxQL response is cached or I do not know - at least subsequent queries are faster 0m0.053s (influxQL) vs 0m0.468s (flux)

Jul 07 '21 09:07 aldas

I update my InfluxDB from 1.8 to 2.2.0. I create a new bucket since I want to change a very little bit my data, and I create a script to transform this data from the old bucket to a new one. I get very different execution times querying the old and the new bucket, with the same data stored, and I use the same Flux query! It's possible that the engine is different between the two buckets (one upgraded from 1.8 and the other one created new in 2.2.0)?

May 09 '22 12:05 dometec

I use Influx 2.6 on Windows with activated InfluxQL. I use Grafana 9.2.10 on Windows and added two Data sources:

InfluxDB 2 with Flux
InfluxDB 2 with InfluxQL

I have a Panel with over 20 Queries.

Flux needs more than 2 Seconds
InfluxQL needs about 500ms

It seems there is a lot of work....

Jan 31 '23 08:01 UlrichThiess

We are also very interested in a resolution to this

Jan 31 '23 13:01 MatinF

The likelihood of this issue being fixed is extremely low, Flux is removed in InfluxDB 3 and influxdata is recommending to stick with InfluxQL going forward, you'll have to use 3rd party software to translate Flux to SQL/InfluxQL on the fly on InfluxDB 3

Sources: The Plan for InfluxDB 3.0 Open Source Is Flux being deprecated with InfluxDB 3.0?

Oct 12 '23 14:10 vincejv

That is very unfortunate. Flux allows for quite complex and extensive programmings. Once you have understood the language, a wide and varied range of possibilities open up for modifying and expanding tables. I hope that all possibilities can also be implemented with InfluxQL.

Oct 12 '23 15:10 UlrichThiess

In one of their videos the message was (paraphrasing from memory): "we hoped that Flux would get wider tracktion as a query language, but that has not happened....so we are phasing it out". Or something to that effect. Sorry, don't have the video handy.

I already switched to influxql because of very clear performance difference / below par performance in very basic use cases.

Oct 12 '23 16:10 tkurki

Has anyone investigated usage of influxQL as a replacement of periodic flux alerts?

Some context: I'd like to do something like that every 1s: fetch last 1 second of data(), |> fieldsAsCols(), |> groupby(tags), |> last()

and additionally count ok/nok status instances. And it's too much for flux, so any alternatives would be appreciated

Dec 16 '23 16:12 ptrhbt

influxdb influxdb copied to clipboard

Performance difference between Flux and InfluxQL

influxdb
influxdb copied to clipboard