questdb.io
questdb.io copied to clipboard
Update - IN operator has some cool undocumented features
The IN
operator has some extra features. The documentation suggest we can add an extra modifier using semicolon as the separator to increase/decrease the time range. What it doesn't say is you can actually add extra parameters (not sure what the official name for those is) to repeat the filter in subsequent time periods.
Now with an example (you can run this at https://demo.questdb.io)
Basic IN
works like this and would search anything within that particular day
select pickup_datetime from trips WHERE pickup_datetime IN '2018-06-01T13:00:00'
Now I can extend to search from that timestamp and extend to an extra point in time, let's say 1 minute
select pickup_datetime from trips WHERE pickup_datetime IN '2018-06-01T13:00:00;1m'
So far, so good. All of this was documented. Now imagine I want to search records within the time interval of that minute, but in consecutive days. This is supported with this syntax:
select pickup_datetime from trips WHERE pickup_datetime IN '2018-06-01T13:00:00;1m;24h;10'
Or the equivalent
select pickup_datetime from trips WHERE pickup_datetime IN '2018-06-01T13:00:00;1m;1d;10'
What we are doing here is saying: I want to repeat this filter for every 24h (in the first query) or 1d (in the second query) and do this 10 times. The result of both queries above would be getting all the trips from 13:00 to 13:01 from 2018-06-01 to 2018-06-10.
Let's see what happens If we change the 24h/1d parameter to 12h, like this
select pickup_datetime from trips WHERE pickup_datetime IN '2018-06-01T13:00:00;1m;12h;10'
In this case it would give us results for each interval from 13:00 to 13:01 and also from 01:00 to 01:01, since we are repeating every 12h. Since we are doing this 10 times, we would be getting results for 5 days (10 intervals of 12 hours), finishing on 2018-06-06T01:01:00
Note: This is not supported if the modifier is a negative range
Location:
https://questdb.io/docs/reference/sql/where#time-range
Also incorporate https://github.com/questdb/questdb/issues/2509 when ready
As per Amy's suggestion, we can add some story here to illustrate how this could be used. Since we are using the taxi rides dataset, we could have the following:
We want to see how taxi rides have been evolving in the holiday of July 4th in New York City from 10am to 10pm, in 4 hours intervals, from 2009 until 2018. We set the starting timestamp as '2009-07-04T10:00:00', we add the modifier 12h to get until 10pm, then we add 1y;10 to repeat 10 times every year from the starting timestamp.
select max(pickup_datetime), count(*) from trips WHERE pickup_datetime IN '2009-07-04T10:00:00;12h;1y;10'
sample by 4h align to calendar
Note that if you don't know the 1st and last date in your database and you want to query the whole range, you could set up and arbitrary starting year and a large range, like
select max(pickup_datetime), count(*) from trips WHERE pickup_datetime IN '1970-07-04T10:00:00;12h;1y;1000'
sample by 4h align to calendar
create table scores as select timestamp_sequence(0,1) ts from long_sequence(1000)
- [ ] #1077
- [x] #1076
Also to track progress for https://github.com/questdb/questdb/issues/2509