greptimedb
greptimedb copied to clipboard
Tracking issue for dataflow framework
What problem does the new feature solve?
Being able to do simple continuous aggregation.
What does the feature do?
- not a complete streaming-processing system. Only a must subset functionalities are provided.
- can handle most aggregate operators within one table(i.e. Sum, avg, min, max and comparison operators). But others (join, trigger, txn etc.) are not the target feature. Framework
Implementation challenges
- [ ] Persisent intermediate state for operator
- [ ] Write more operator suitable for stream computation
- [ ] Limit the scope of system boundary to simple aggregation.
Implementation Progress
- [x] RFC #3185
- [x] Basic Types, Expressions&Functions:
- [x] #3186
- [x] #3267
- [x] #3283
- [x] #3359
- [x] #3396
- [ ] Dataflow Logical Plan Defination & Execution
- [x] #3490
- [x] #3508 and #3552
- [x] #3581
- [ ] #3736
- [ ] SQL/substrait to Dataflow Logical Plan translator
- [x] #3657
- [x] #3690
- [ ] register tasks to metasrc, and coordinate everything to make flow useable https://github.com/GreptimeTeam/greptimedb/issues/3664
@discord9 I read the RFC now and wonder what's a completed sample for this feature.
I can see how to create a task (continuous query/materialize view):
CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)
Then we can use avg_over_5m
as a normal table reference in query?
@discord9 I read the RFC now and wonder what's a completed sample for this feature.
I can see how to create a task (continuous query/materialize view):
CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)
Then we can use
avg_over_5m
as a normal table reference in query?
Yes, sorry for the late reply, github's layout for issues is really terrible, this task also create a result table avg_over_5m
and write to it with negligible delay, so naturally one can use avg_over_5m
in normal query
@discord9 I think we can close this issue right now. The next iteration could start with a new issue. What do you think?
Close this issue as now have a basic dataflow framework, and can start a new issue to track it's next iteration