greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

Align SemanticType definition in proto with other places

Open waynexia opened this issue 2 years ago • 3 comments

What type of enhancement is this?

Refactor

What does the enhancement do?

The three semantic type defined in proto are TIMESTAMP, TAG and FIELD

https://github.com/GreptimeTeam/greptimedb/blob/8faa6b0f09ddc75f5a5423f7774e9bb40df9fadc/src/api/greptime/v1/column.proto#L8-L12

But what in other places they are TIME INDEX, PRIMARY KEY and VALUE: https://github.com/GreptimeTeam/greptimedb/blob/8faa6b0f09ddc75f5a5423f7774e9bb40df9fadc/src/table/src/metadata.rs#L72-L78

https://github.com/GreptimeTeam/greptimedb/blob/8faa6b0f09ddc75f5a5423f7774e9bb40df9fadc/src/datatypes/src/schema.rs#L40-L45

This inconsistency is ambiguous (like #534), we'd better to align them.

Implementation challenges

No response

waynexia avatar Nov 17 '22 03:11 waynexia

+1. Sounds reasonable.

evenyag avatar Nov 17 '22 04:11 evenyag

I'm afraid that is related to frontend gRPC proto refactoring what @fengjiachun is working on.

Maybe we need two kinds of protos, for insert API,

  • for protos released to users: it should be simple and easy to understand. Users may already be familiar with tags/fields.
  • for protos used between frontend and datanode, tags/fields should already be converted to primary key indices/timestamp index, we should only keep the latter.

v0y4g3r avatar Nov 17 '22 04:11 v0y4g3r

I'm afraid that is related to frontend gRPC proto refactoring what @fengjiachun is working on.

Maybe we need two kinds of protos, for insert API,

  • for protos released to users: it should be simple and easy to understand. Users may already be familiar with tags/fields.
  • for protos used between frontend and datanode, tags/fields should already be converted to primary key indices/timestamp index, we should only keep the latter.

I'm doing some refactoring work on the gRPC proto. I don't think my work will affect this issue because I plan to share the Column structure in Users/Frontend/Datanode.

fengjiachun avatar Nov 17 '22 05:11 fengjiachun

Hi, I just want to confirm that TIMESTAMP is corresponded to TIME INDEX, TAG is corresponded to PRIMARY KEY and FIELD is corresponded to VALUE, and the point of this issue is to unify the naming across different files to remove the ambiguity? i.e. this is simply a renaming issue.

I think TIMESTAMP is better in between itself and TIME INDEX as timestamp is more of a universal accepted name. I don't know exactly the semantics of TAG and FIELD yet, so I don't have much opinion on them.

(I'm currently looking for a good first issue to start for greptime db contribution.)

qstommyshu avatar Apr 30 '23 20:04 qstommyshu

Thanks for your enthusiasm!

Both TAG and FIELD are concepts from time series database background. Simply speaking, TAG is the indexed column (thus PRIMARY KEY) and FIELD is the value data (thus previously VALUE).

I think TIMESTAMP is better in between itself and TIME INDEX as timestamp is more of a universal accepted name.

TIMESTAMP is also one of the value types. Thus it might lead to misunderstanding if we mix the TIMESTAMP type and TIMESTAMP semantic type. One scenario is the result in DESC TABLE clause, where data type and semantic types are displayed together.

This issue is a bit outdated. In #1326 I changed VALUE to FIELD. Things left to do in this issue is to change the definitions in protobuf. I'm not sure this part is still beginner friendly, as it involves too many components and is a breaking change. cc @fengjiachun @yuanbohan

waynexia avatar May 01 '23 02:05 waynexia

No worries, thanks for explaining what semantic type TAG and FIELD are to me. And what do you mean by "changing the definitions in protobuf", do you mean giving a more meaningful name to the enum definition in "column.proto" file? And can you explain why this would be a breaking change?

Thanks.

qstommyshu avatar May 01 '23 11:05 qstommyshu

And what do you mean by "changing the definitions in protobuf", do you mean giving a more meaningful name to the enum definition in "column.proto" file? And can you explain why this would be a breaking change?

Yes. And changing the protobuf itself is a breaking change.

waynexia avatar May 03 '23 09:05 waynexia