greptimedb Tracking issue for dataflow framework

What problem does the new feature solve?

Being able to do simple continuous aggregation.

What does the feature do?

not a complete streaming-processing system. Only a must subset functionalities are provided.
can handle most aggregate operators within one table(i.e. Sum, avg, min, max and comparison operators). But others (join, trigger, txn etc.) are not the target feature. Framework

Implementation challenges

[ ] Persisent intermediate state for operator
[ ] Write more operator suitable for stream computation
[ ] Limit the scope of system boundary to simple aggregation.

Implementation Progress

[x] RFC #3185
[x] Basic Types, Expressions&Functions:
- [x] #3186
- [x] #3267
- [x] #3283
- [x] #3359
- [x] #3396
[ ] Dataflow Logical Plan Defination & Execution
- [x] #3490
- [x] #3508 and #3552
- [x] #3581
- [ ] #3736
[ ] SQL/substrait to Dataflow Logical Plan translator
- [x] #3657
- [x] #3690
[ ] register tasks to metasrc, and coordinate everything to make flow useable https://github.com/GreptimeTeam/greptimedb/issues/3664

Jan 18 '24 02:01 discord9

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):

CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)

Then we can use avg_over_5m as a normal table reference in query?

Mar 06 '24 06:03 tisonkun

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):
CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)
Then we can use avg_over_5m as a normal table reference in query?

Yes, sorry for the late reply, github's layout for issues is really terrible, this task also create a result table avg_over_5m and write to it with negligible delay, so naturally one can use avg_over_5m in normal query

Apr 08 '24 09:04 discord9

@discord9 I think we can close this issue right now. The next iteration could start with a new issue. What do you think?

May 31 '24 21:05 killme2008

Close this issue as now have a basic dataflow framework, and can start a new issue to track it's next iteration

Jun 03 '24 06:06 discord9

greptimedb greptimedb copied to clipboard

Tracking issue for dataflow framework

What problem does the new feature solve?

What does the feature do?

Implementation challenges

Implementation Progress

greptimedb
greptimedb copied to clipboard