greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

Make pipeline creation idempotent

Open zyy17 opened this issue 1 year ago • 2 comments

What type of enhancement is this?

User experience

What does the enhancement do?

Current Implementation

The current pipeline creation is not idempotent, for example, when we create the same pipeline twice:

# 1
curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "[email protected]"

# 2 Create the same pipeline again.
curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "[email protected]"

It will store the multiple pipelines in greptime_private.pipelines:

mysql> select name, schema, created_at from greptime_private.pipelines;
+----------------+--------+----------------------------+
| name           | schema | created_at                 |
+----------------+--------+----------------------------+
| nginx_pipeline | public | 2024-12-03 03:04:08.220261 |
| nginx_pipeline | public | 2024-12-03 03:04:26.441025 |
+----------------+--------+----------------------------+
2 rows in set (0.02 sec)

Expectation

In my opinion, the pipeline should be unique throughout its lifetime. We can use (name, schema) as the unique constraint. When we create the same pipeline that has already been exited, it should be a UPDATE operation, which means the semantic of API /v1/events/pipelines/${pipeline} should create_or_update pipeline. The idempotent creation will be easy to use and operate.

Implementation challenges

No response

zyy17 avatar Dec 03 '24 03:12 zyy17

Internally @paomian implemented a versioned system for pipeline. So it will always use the latest one for parsing data. There is also a version parameter by which you can specify the exact pipeline name and create time.

https://docs.greptime.com/user-guide/logs/write-logs#http-api

sunng87 avatar Dec 03 '24 06:12 sunng87

@sunng87 @paomian I think using the creation timestamp as the implicit version is very hard to use. The user has to query the exact time to use it.

We can consider the following approaches:

  • If the user doesn't specify the version field when creating, always store the latest version of the pipeline. It's very confusing to get multiple pipelines if the user queries from greptime_private.pipelines table. Actually, it's enough to store the latest one;

  • If the user specifies the version explicitly, follow the current logic;

Furthermore, maybe we should refactor the docs and add the version field in creation.

zyy17 avatar Dec 03 '24 07:12 zyy17