influxdb Support disparate time intervals and more advanced time in WHERE clauses

Feature Request

I'm calling this a feature request, but it's both a feature request and a bug in the existing query engine. Explained more below.

Proposal: Support disparate time intervals and more advanced time in WHERE clauses.

Current behavior: You can only select one continuous block of time. There is no way to select multiple blocks of time even though the query language implies it should be possible. In fact, we completely ignore any of the query language that would suggest that it's possible in favor of doing something unexpected. Given the following examples:

SELECT * FROM cpu WHERE time >= now() - 10m OR (time >= now() - 20m AND region = 'uswest')

The above query says you want any points within the last 10 minutes or within the last 20 minutes if the region is uswest. What actually happens is we take all points from the last 10 minutes because we naively look through every binary expression that has time as one of the arguments and assume that they were put together with AND.

Desired behavior: Parse the WHERE clause to retrieve the disparate time periods that it specifies correctly. The above example would create two different time intervals. One where the time was anything within the last 10 minutes and one that was within the last 20 minutes if the region was uswest.

Use case: It prevents confusion from people who are attempting to use the WHERE clause in this way and then they find out we do not parse the time clauses in the way that we imply we do. It also opens possibilities in the future of implementing new syntax for selecting only time points in the morning or the afternoon.

I'm not sure how useful this is so it will likely remain low priority. If this is important for your use case, please add a 👍 reaction to this issue and explain your use case so we can learn more about this issue and those it affects.

Thanks!

Oct 27 '16 18:10 jsternberg

Hello!

I have a simple use case on that issue, where the implementation of OR would be very useful.

We are studying availability rates of devices, and in the calculation we need frequently to remove uninteresting periods of time. We use influxDB as a database, and queries are made from our python script. Example: calculation of availability between 2017-05-20 and 2017-06-20, but the devices was in maintenance between 2017-05-24 and 2017-05-26, so its irrelevant to consider that period.

Without the OR, we need to make two queries.

Thus the implementation of OR would be awesome!! Is there any plan on that?

Aug 01 '17 13:08 lblt

We are trying detecting anomalies by comparing current hour with the same hour of the previous three four weeks. For that we need to specify different time ranges

Aug 31 '17 09:08 ahsanali

I'm not sure how disparate intervals will help with that unless we also support time shift, which the current query engine does not. We are aware of time shift and want to support it in the future, but we are running into the limits of using an SQL-like language for things like that.

At the same time, you can just use two separate queries and you will get the data you want.

Aug 31 '17 17:08 jsternberg

To make a note on this issue, we may not be able to do this with the current query engine because of a little weird aspect of the query parsing that too many people likely rely on.

The following condition is parsed in a way that many likely won't expect.

WHERE host = 'server01' OR host = 'server02' AND time > now() - 1d

This actually gets parsed as something equivalent to this:

WHERE host = 'server01' OR (host = 'server02' AND time > now() - 1d)

That would mean the time restriction should only be applied to host = 'server02', but the current query engine applies that time range to both conditions at the moment. If we fixed this bug/implemented this feature, we would have to correct this behavior and old queries would break. There is at least one high profile tool that wrote queries with this kind of error since I just fixed the query in Grafana with grafana/grafana#9131, so I'm not sure we can ever move from this for SQL.

Stay tuned for ifql 2.0 though where we will ensure we do not have this kind of oversight again.

Aug 31 '17 17:08 jsternberg

I'm trying to filter the series with a regex (i.e. WHERE time =~ /regex/) in order to get only data during business hours, i.e. timestamps between 07:00 and 19:00 - regardless of the date. I'm getting no data points back because - I assume - that the query would create disparate time intervals with one data point each - just like with the unsupported "OR". I do not seem to find a way to filter data based on timestamp. The suggested workaround with separate queries does not work...

SELECT mean("perc") FROM "occupancy_avg" WHERE time =~ /'....-..-..T(0[7-9]|1[0-9]):..:..Z'/ GROUP BY time(1h) fill(null)

PS: I think that features like filtering data based on time are very important (business hours, maintenance windows, etc...)

PPS: any hint is appreciated. I'm using influx with Grafana

Oct 19 '17 16:10 abrandl70

This would be useful. I thought I would share the MUCH slower workaround. Since 1.2 you can use sub queries. This means you can use syntax like:

Select mean(*) from 
(select * from channel where (time > '2017-05-01 13:00:00' and time < '2017-05-01 15:00:00') ), 
(select * from channel where (time > '2017-05-01 18:00:00' and time < '2017-05-01 19:00:00') )
GROUP BY time(20m)fill(none);

Apr 24 '18 14:04 andyflatt

That workaround is how we would likely implement this internally. The big problem right now is that we can't really change the behavior of how the time selectors work because they don't follow the normal rules you would expect when selecting disparate times. So to make this work as you would logically expect it to given the conditions, we would have to make existing queries break and at least some versions of Grafana use a method that would break. I think I submitted a fix some time ago to get Grafana to stop constructing the queries incorrectly, but it doesn't remove the fact that a lot of people likely rely on the incorrect behavior.

I'm sorry that's the case.

Apr 24 '18 16:04 jsternberg

@jsternberg so you'll leave incorrect implementation in forever because of backward compatibility, instead of doing the right thing of fixing it?

Do not the project have guidelines how to deal with this type of situation? For example have some flags that would by default support old behaviour for a few releases and then would by default support the new behaviour for a few releases and then remove the incorrect behaviour altogether?

The timelime for phasing out the incorrect one can be years, so that everyone who depend on it has time to update.

To me that would seem logical, saying this is wrong but we are not fixing it sounds strange.

Apr 24 '18 20:04 AndrewSav

The current plan is to fix this as part of ifql since ifql, by language design, won't have this issue. It's being worked on right now. By the time we would be able to phase out the incorrect behavior and find all of the places for it to be fixed in the wild, ifql will likely be mature and have more features than the current influxql.

I apologize that we are not able to fix this in a better way, but mistakes happen and that's the most I can say given the circumstances. I really wish I could have fixed this issue and I did try once in the past. Here's the commit where I fixed the behavior and here's the one where I had to revert the fix.

Apr 24 '18 20:04 jsternberg

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jul 24 '19 00:07 stale[bot]

The current plan is to fix this as part of ifql since ifql, by language design, won't have this issue.

Can we get update on this? Is this still happening?

Jul 24 '19 01:07 AndrewSav

I also happen to have run into that bug minutes ago, so I'm wondering what is the recommended solution as of today. Is it still to perform N queries for the N intervals we want to query (and then join them manually if we are using GROUP BY)?

Jul 24 '19 02:07 MickaelBergem

Same question that @MickaelBergem , i found this problem right now so in the version 1.7.8 still happening

Sep 30 '19 09:09 aitorMarrero

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Dec 29 '19 09:12 stale[bot]

Still would like this to be fixed.

Dec 29 '19 18:12 AndrewSav

This silently (!) does not work: DELETE FROM "x" where time = 1 OR time = 5 we don't even get an error, it simply does not delete. (tried InfluxDB v1.4.2)

Jul 13 '20 16:07 FelixJongleur42

Usually no error given if a filter returns 0 records. May be that's what's happening here?

Aug 16 '20 23:08 AndrewSav

Do we have any workaround for this other than the sub-query approach suggested above?

Aug 30 '20 06:08 KishanKishore

Nearly 4 years later and I still trip over this bug on which people apparently rely on working. Please fix.

Sep 24 '20 12:09 one4many

Usually no error given if a filter returns 0 records. May be that's what's happening here?

No that is not the case since these two times exist. It should give an error that disparate queries are not supported.

Works: DELETE FROM "x" where time = 1; DELETE FROM "x" where time = 5

Silently (!) fails: DELETE FROM "x" where time = 1 OR time = 5

Sep 24 '20 19:09 FelixJongleur42

Keep an eye on this issue

Oct 26 '20 02:10 lty628

这将是有用的。我以为我会分享慢得多的解决方法。自 1.2 起，您可以使用子查询。这意味着您可以使用以下语法：
Select mean(*) from 
(select * from channel where (time > '2017-05-01 13:00:00' and time < '2017-05-01 15:00:00') ), 
(select * from channel where (time > '2017-05-01 18:00:00' and time < '2017-05-01 19:00:00') )
GROUP BY time(20m)fill(none);

thinks

Oct 26 '20 02:10 lty628

Is this supported with Flux as I understand this will never be solved on InfluxQL.

Mar 22 '21 16:03 AlexGrs

This can be done with Flux by using from |> range on two different ranges and then union() to merge the two results. You can also do something like:

import "date"
from(bucket: "mybucket")
    |> range(start: -1w)
    |> filter(fn: (r) => r._measurement == "m0" and r._field == "f0")
    |> filter(fn: (r) => date.hour(t: r._time) >= 9 and date.hour(t: r._time) < 15)

Just be sure you use the second filter after the first. The second filter won't be pushed down into the database.

Jun 09 '21 14:06 jsternberg

Can anyone tell when this will be fixed ? I really need this. Because this bug is causing performance issues.

Nov 11 '22 13:11 Neofield-Dylan

That workaround is how we would likely implement this internally. The big problem right now is that we can't really change the behavior of how the time selectors work because they don't follow the normal rules you would expect when selecting disparate times. So to make this work as you would logically expect it to given the conditions, we would have to make existing queries break and at least some versions of Grafana use a method that would break. I think I submitted a fix some time ago to get Grafana to stop constructing the queries incorrectly, but it doesn't remove the fact that a lot of people likely rely on the incorrect behavior.

I'm sorry that's the case.

Not fixing something because "people rely on the incorrect behavior" is never a good approach.

Dec 01 '22 21:12 shrx

有没有比较好的方案实现这个功能，我真的很需要它去解决性能上的一些问题。

Apr 10 '23 08:04 cy2ly

Has this been addressed? The way the 'OR' behaves in WHERE clause is WRONG! Is Influx going to have it fixed?

Jan 22 '24 16:01 leonidmakarovsky

Has this been addressed? The way the 'OR' behaves in WHERE clause is WRONG! Is Influx going to have it fixed?

As far I can tell, they will never fix this. You can see that they add a FAQ for this problem. I just gave up of influx. It just bad design. Has very weird behaviour and non commercial patterns implemented into it. Another example is a issue that I had when updating a row. Instead of just updating it, it created a new column which make I lose the db because it became too slow. I had to delete and create a new db and restore data into it. Every single fix that I had to do in influx ended like this (delete db and create a new with data into it). And when the data increased, I had performance issues and big memory consumption, because my data has many measurements and each one is a timeseries. On top of that influx cloud solution is too expensive, and manual devops is too hard when influx is using more and more memory.

Jan 22 '24 16:01 Neofield-Dylan

influxdb influxdb copied to clipboard

Support disparate time intervals and more advanced time in WHERE clauses

Feature Request

influxdb
influxdb copied to clipboard