greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

Tracking issue for dataflow framework

Open discord9 opened this issue 1 year ago • 2 comments

What problem does the new feature solve?

Being able to do simple continuous aggregation.

What does the feature do?

  • not a complete streaming-processing system. Only a must subset functionalities are provided.
  • can handle most aggregate operators within one table(i.e. Sum, avg, min, max and comparison operators). But others (join, trigger, txn etc.) are not the target feature. Framework

Implementation challenges

  • [ ] Persisent intermediate state for operator
  • [ ] Write more operator suitable for stream computation
  • [ ] Limit the scope of system boundary to simple aggregation.

Implementation Progress

  • [x] RFC #3185
  • [x] Basic Types, Expressions&Functions:
    • [x] #3186
    • [x] #3267
    • [x] #3283
    • [x] #3359
    • [x] #3396
  • [ ] Dataflow Logical Plan Defination & Execution
    • [x] #3490
    • [x] #3508 and #3552
    • [x] #3581
    • [ ] #3736
  • [ ] SQL/substrait to Dataflow Logical Plan translator
    • [x] #3657
    • [x] #3690
  • [ ] register tasks to metasrc, and coordinate everything to make flow useable https://github.com/GreptimeTeam/greptimedb/issues/3664

discord9 avatar Jan 18 '24 02:01 discord9

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):

CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)

Then we can use avg_over_5m as a normal table reference in query?

tisonkun avatar Mar 06 '24 06:03 tisonkun

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):

CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)

Then we can use avg_over_5m as a normal table reference in query?

Yes, sorry for the late reply, github's layout for issues is really terrible, this task also create a result table avg_over_5m and write to it with negligible delay, so naturally one can use avg_over_5m in normal query

discord9 avatar Apr 08 '24 09:04 discord9

@discord9 I think we can close this issue right now. The next iteration could start with a new issue. What do you think?

killme2008 avatar May 31 '24 21:05 killme2008

Close this issue as now have a basic dataflow framework, and can start a new issue to track it's next iteration

discord9 avatar Jun 03 '24 06:06 discord9