greptimedb
greptimedb copied to clipboard
Align SemanticType definition in proto with other places
What type of enhancement is this?
Refactor
What does the enhancement do?
The three semantic type defined in proto are TIMESTAMP
, TAG
and FIELD
https://github.com/GreptimeTeam/greptimedb/blob/8faa6b0f09ddc75f5a5423f7774e9bb40df9fadc/src/api/greptime/v1/column.proto#L8-L12
But what in other places they are TIME INDEX
, PRIMARY KEY
and VALUE
:
https://github.com/GreptimeTeam/greptimedb/blob/8faa6b0f09ddc75f5a5423f7774e9bb40df9fadc/src/table/src/metadata.rs#L72-L78
https://github.com/GreptimeTeam/greptimedb/blob/8faa6b0f09ddc75f5a5423f7774e9bb40df9fadc/src/datatypes/src/schema.rs#L40-L45
This inconsistency is ambiguous (like #534), we'd better to align them.
Implementation challenges
No response
+1. Sounds reasonable.
I'm afraid that is related to frontend gRPC proto refactoring what @fengjiachun is working on.
Maybe we need two kinds of protos, for insert API,
- for protos released to users: it should be simple and easy to understand. Users may already be familiar with tags/fields.
- for protos used between frontend and datanode, tags/fields should already be converted to primary key indices/timestamp index, we should only keep the latter.
I'm afraid that is related to frontend gRPC proto refactoring what @fengjiachun is working on.
Maybe we need two kinds of protos, for insert API,
- for protos released to users: it should be simple and easy to understand. Users may already be familiar with tags/fields.
- for protos used between frontend and datanode, tags/fields should already be converted to primary key indices/timestamp index, we should only keep the latter.
I'm doing some refactoring work on the gRPC proto.
I don't think my work will affect this issue because I plan to share the Column
structure in Users/Frontend/Datanode.
Hi,
I just want to confirm that TIMESTAMP
is corresponded to TIME INDEX
, TAG
is corresponded to PRIMARY KEY
and FIELD
is corresponded to VALUE
, and the point of this issue is to unify the naming across different files to remove the ambiguity? i.e. this is simply a renaming issue.
I think TIMESTAMP
is better in between itself and TIME INDEX
as timestamp is more of a universal accepted name.
I don't know exactly the semantics of TAG
and FIELD
yet, so I don't have much opinion on them.
(I'm currently looking for a good first issue to start for greptime db contribution.)
Thanks for your enthusiasm!
Both TAG
and FIELD
are concepts from time series database background. Simply speaking, TAG
is the indexed column (thus PRIMARY KEY
) and FIELD
is the value data (thus previously VALUE
).
I think TIMESTAMP is better in between itself and TIME INDEX as timestamp is more of a universal accepted name.
TIMESTAMP
is also one of the value types. Thus it might lead to misunderstanding if we mix the TIMESTAMP
type and TIMESTAMP
semantic type. One scenario is the result in DESC TABLE
clause, where data type and semantic types are displayed together.
This issue is a bit outdated. In #1326 I changed VALUE
to FIELD
. Things left to do in this issue is to change the definitions in protobuf. I'm not sure this part is still beginner friendly, as it involves too many components and is a breaking change. cc @fengjiachun @yuanbohan
No worries, thanks for explaining what semantic type TAG
and FIELD
are to me.
And what do you mean by "changing the definitions in protobuf", do you mean giving a more meaningful name to the enum definition in "column.proto" file?
And can you explain why this would be a breaking change?
Thanks.
And what do you mean by "changing the definitions in protobuf", do you mean giving a more meaningful name to the enum definition in "column.proto" file? And can you explain why this would be a breaking change?
Yes. And changing the protobuf itself is a breaking change.