Make pipeline creation idempotent
What type of enhancement is this?
User experience
What does the enhancement do?
Current Implementation
The current pipeline creation is not idempotent, for example, when we create the same pipeline twice:
# 1
curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "[email protected]"
# 2 Create the same pipeline again.
curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "[email protected]"
It will store the multiple pipelines in greptime_private.pipelines:
mysql> select name, schema, created_at from greptime_private.pipelines;
+----------------+--------+----------------------------+
| name | schema | created_at |
+----------------+--------+----------------------------+
| nginx_pipeline | public | 2024-12-03 03:04:08.220261 |
| nginx_pipeline | public | 2024-12-03 03:04:26.441025 |
+----------------+--------+----------------------------+
2 rows in set (0.02 sec)
Expectation
In my opinion, the pipeline should be unique throughout its lifetime. We can use (name, schema) as the unique constraint. When we create the same pipeline that has already been exited, it should be a UPDATE operation, which means the semantic of API /v1/events/pipelines/${pipeline} should create_or_update pipeline. The idempotent creation will be easy to use and operate.
Implementation challenges
No response
Internally @paomian implemented a versioned system for pipeline. So it will always use the latest one for parsing data. There is also a version parameter by which you can specify the exact pipeline name and create time.
https://docs.greptime.com/user-guide/logs/write-logs#http-api
@sunng87 @paomian I think using the creation timestamp as the implicit version is very hard to use. The user has to query the exact time to use it.
We can consider the following approaches:
-
If the user doesn't specify the
versionfield when creating, always store the latest version of the pipeline. It's very confusing to get multiple pipelines if the user queries fromgreptime_private.pipelinestable. Actually, it's enough to store the latest one; -
If the user specifies the
versionexplicitly, follow the current logic;
Furthermore, maybe we should refactor the docs and add the version field in creation.