Deprecate custom UDF time_bucket in favor of date_bin
Describe This Problem
In early version of CeresDB, there is a custom UDF time_bucket to align time of values, but now datafusion support it out of box, so I suggest deprecate our time_bucket, and use data_bin instead. Ex:
SELECT
DATE_BIN(INTERVAL '15' minute, time, TIMESTAMP '2001-01-01T00:00:00Z') AS time,
val
FROM (
VALUES
(TIMESTAMP '2021-06-10 17:05:00Z', 0.5),
(TIMESTAMP '2021-06-10 17:19:10Z', 0.3)
) as t (time, val)
"+---------------------+-----+",
"| time | val |",
"+---------------------+-----+",
"| 2021-06-10 17:00:00 | 0.5 |",
"| 2021-06-10 17:15:00 | 0.3 |",
"+---------------------+-----+",
Proposal
- Completely remove it before 1.0 is out.
- If cannot remove it, add a deprecation warning.
Additional Context
https://github.com/apache/arrow-datafusion/issues/3015
I wonder whether time_bucket can be implemented by data_bin. And I prefer to keep the time_bucket UDF if it can, because time_bucket has been provided for use in some scenarios.
I think yes, those two functions are almost the same, only with difference in how to set time duration
- date_bin:
INTERVAL '15' minute - time_bucket:
PT15M
Anyone interested in this issue can try reimplement time_bucket based on date_bin
Found one bug related with time_bucket: https://github.com/CeresDB/ceresdb/issues/563
assigne me.
date_bin seems to have less functionality than time_bucket, and there is a big difference between the two. Do I just need to keep date_bin's date_bin(stride, source, origin) syntax?
ScalarFunction::DateBin => Ok(date_bin(
parse_expr(&args[0], registry)?,
parse_expr(&args[1], registry)?,
parse_expr(&args[2], registry)?,
)),
Yes, in fact only first two args is used in time_bucket.
fn new_udf() -> ScalarUdf {
// args:
// - timestamp column.
// - period.
// - input timestamp format in PARTITION BY (unsed now).
// - input timezone (ignored now).
// - timestamp output format (ignored now).
let func = |args: &[ColumnarValue]| {
let bucket = TimeBucket::parse_args(args)
.box_err()
.context(InvalidArguments)?;
let result_column = bucket.call().box_err().context(CallFunction)?;
Ok(ColumnarValue::Array(result_column))
};