influxdb icon indicating copy to clipboard operation
influxdb copied to clipboard

[1.x] Add `date_part` scalar function to influxdb

Open devanbenz opened this issue 4 weeks ago • 1 comments

Apache Datafusion, the query engine that powers Influx v3, supports the following function: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#date-part

Please implement a similardate_part within influxdb 1.x.

date_part(part, expression)

Arguments

- part: Part of the date to return. The following date parts are supported:
   - year
    - quarter (emits value in inclusive range [1, 4] based on which quartile of the year the date is in)
    - month
    - week (week of the year)
    - day (day of the month)
    - hour
    - minute
    - second
    - millisecond
    - microsecond
    - nanosecond
    - dow (day of the week where Sunday is 0)
    - doy (day of the year)
    - epoch (seconds since Unix epoch)
    - isodow (day of the week where Monday is 0)
- expression: Time expression to operate on. Must be time VarRef.  

Example usage:

SELECT * FROM some_measurement
WHERE time >= now() - 10d AND time <= now() AND (date_part(time, dow) != 0 AND date_part(time, dow) != 6)

devanbenz avatar Dec 02 '25 15:12 devanbenz

After doing some testing and writing up some code for this I've found that

SELECT date_part('dow', time) FROM foo

Where we have date_part(...)as the only field, this cannot work with the current storage engine semantics.

For example, let's take a look at math functions, specifically sin which does not aggregate (similar to date_part).

Given a schema which contains uuuu as an integer using some sample data locally.

Running SELECT sin(uuuu) FROM test_foo returns

> select sin(uuuu) from test_foo
name: test_foo
time                sin
----                ---
1764702687714533000 -0.45990349068959124
1764703190922883000 -0.1464976141626518
1764704886398781000 0.6360869563962336
1764705088831162000 -0.8299531227173891
1764705880556364000 -0.7854470419381677
1764863647789047000 -0.7854470419381677

Running SELECT sin(time) FROM test_foo returns nothing.

> select sin(time) from test_foo
>

date_part has similar semantics, but, because it operates on the time field it does require that there are other fields in the SELECT clause.

> select uuuu, date_part('dow', time) from test_foo
name: test_foo
time                uuuu    date_part
----                ----    ---------
1764702687714533000 123     2
1764703190922883000 144444  2
1764704886398781000 9999    2
1764705088831162000 7777777 2
1764705880556364000 8989    2
1764863647789047000 8989    4
> select date_part('dow', time) from test_foo // without additional fields
>

Another way that date_part is similar to math functions:

With a math function, the field is created after WHERE clause evaluation.

> select sin(uuuu) from test_foo WHERE sin > 0
> select sin(uuuu) from test_foo WHERE sin(uuuu) > 0
name: test_foo
time                sin
----                ---
1764704886398781000 0.6360869563962336
>

This means that we must call the same function signature within a WHERE clause when evaluating the predicate.

date_part works the same way:

> select uuuu, date_part('dow', time) from test_foo WHERE date_part = 2
> select uuuu, date_part('dow', time) from test_foo WHERE date_part('dow', time) = 2
name: test_foo
time                uuuu    date_part
----                ----    ---------
1764702687714533000 123     2
1764703190922883000 144444  2
1764704886398781000 9999    2
1764705088831162000 7777777 2
1764705880556364000 8989    2
>

If we would like it to work differently, i.e. allow single select semantics, and implement filtering on the column instead of function call, this would require non-trivial changes to the storage engine.

devanbenz avatar Dec 08 '25 19:12 devanbenz