[1.x] Add `date_part` scalar function to influxdb
Apache Datafusion, the query engine that powers Influx v3, supports the following function: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#date-part
Please implement a similardate_part within influxdb 1.x.
date_part(part, expression)
Arguments
- part: Part of the date to return. The following date parts are supported:
- year
- quarter (emits value in inclusive range [1, 4] based on which quartile of the year the date is in)
- month
- week (week of the year)
- day (day of the month)
- hour
- minute
- second
- millisecond
- microsecond
- nanosecond
- dow (day of the week where Sunday is 0)
- doy (day of the year)
- epoch (seconds since Unix epoch)
- isodow (day of the week where Monday is 0)
- expression: Time expression to operate on. Must be time VarRef.
Example usage:
SELECT * FROM some_measurement
WHERE time >= now() - 10d AND time <= now() AND (date_part(time, dow) != 0 AND date_part(time, dow) != 6)
After doing some testing and writing up some code for this I've found that
SELECT date_part('dow', time) FROM foo
Where we have date_part(...)as the only field, this cannot work with the current storage engine semantics.
For example, let's take a look at math functions, specifically sin which does not aggregate (similar to date_part).
Given a schema which contains uuuu as an integer using some sample data locally.
Running SELECT sin(uuuu) FROM test_foo returns
> select sin(uuuu) from test_foo
name: test_foo
time sin
---- ---
1764702687714533000 -0.45990349068959124
1764703190922883000 -0.1464976141626518
1764704886398781000 0.6360869563962336
1764705088831162000 -0.8299531227173891
1764705880556364000 -0.7854470419381677
1764863647789047000 -0.7854470419381677
Running SELECT sin(time) FROM test_foo returns nothing.
> select sin(time) from test_foo
>
date_part has similar semantics, but, because it operates on the time field it does require that there are other fields in the SELECT clause.
> select uuuu, date_part('dow', time) from test_foo
name: test_foo
time uuuu date_part
---- ---- ---------
1764702687714533000 123 2
1764703190922883000 144444 2
1764704886398781000 9999 2
1764705088831162000 7777777 2
1764705880556364000 8989 2
1764863647789047000 8989 4
> select date_part('dow', time) from test_foo // without additional fields
>
Another way that date_part is similar to math functions:
With a math function, the field is created after WHERE clause evaluation.
> select sin(uuuu) from test_foo WHERE sin > 0
> select sin(uuuu) from test_foo WHERE sin(uuuu) > 0
name: test_foo
time sin
---- ---
1764704886398781000 0.6360869563962336
>
This means that we must call the same function signature within a WHERE clause when evaluating the predicate.
date_part works the same way:
> select uuuu, date_part('dow', time) from test_foo WHERE date_part = 2
> select uuuu, date_part('dow', time) from test_foo WHERE date_part('dow', time) = 2
name: test_foo
time uuuu date_part
---- ---- ---------
1764702687714533000 123 2
1764703190922883000 144444 2
1764704886398781000 9999 2
1764705088831162000 7777777 2
1764705880556364000 8989 2
>
If we would like it to work differently, i.e. allow single select semantics, and implement filtering on the column instead of function call, this would require non-trivial changes to the storage engine.