greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

Refactor frontend and datanode functionalities

Open v0y4g3r opened this issue 2 years ago • 5 comments

After some discussion, we came to a concensus that all protocol-related logic should be migrated to frontend, including SQL, InfluxDB line protocol, etc. Frontend communicates with datanode thru our gRPC protocol.

image

As a result, the "standalone" mode of GreptimeDB should start both frontend and datanode in one process and use inter-process communication like Unix Domain Socket.

  • [ ] Move SQL parser logic from datanode to frontend #455
    • [x] CREATE TABLE
      • CREATE TABLE protocol is already supported in frontend, but we still need to chain the all process together, including table id allocation, table registration and catalog managerment stuff in distributed mode.
    • [x] CREATE DATABASE
    • [x] SELECT
    • [x] SHOW DATABASES/SHOW TABLES
    • [ ] ALTER
      • [x] Add column
      • [ ] Drop column #466
  • [x] Glue frontend and datanode in one process #471
    • [x] Current solution creates&starts frontend from datanode as a service, which seems wierd. We should further refactor these logic to cmd package, and provide 3 subcommand: --standalone/--frontend/--datanode #455

v0y4g3r avatar Nov 10 '22 11:11 v0y4g3r

+1 for this. This resolves #399

Using UDS still requires serialization and deserialization between two parts, which may affect our performance benchmark. We need to setup a framework to support both local and remote invocation for these gRPC functionalities.

sunng87 avatar Nov 10 '22 12:11 sunng87

+1 for this. This resolves #399

Using UDS still requires serialization and deserialization between two parts, which may affect our performance benchmark. We need to setup a framework to support both local and remote invocation for these gRPC functionalities.

We first separate frontend and datanode to make their own responsibilities more clear, then try to find the necessary APIs these two component need to work together, build an abstraction for these APIs and provide implementations based on method invocation and RPC for standalone mode and distributed mode respectively.

v0y4g3r avatar Nov 11 '22 15:11 v0y4g3r

Currently we cannot remove SQL parser logic from datanode completely, since some gRPC services directly pass SQL as a string from frontend to datanode. https://github.com/GreptimeTeam/greptimedb/blob/0d4c191a06418b046ecae9337a173c59ad45a43f/src/datanode/src/instance/grpc.rs#L192

If we want to remove sqlparser from datanode and let frontend to parse SQL, and interact with datanode via an intermediate representation instead of SQL string, we need to build a Protobuf definition for sqlparser-rs's query struct.

But I don't think this is our first priority.

v0y4g3r avatar Nov 11 '22 16:11 v0y4g3r

Thoughts on migrating the "auto-create/alter-on-insertion" feature from datanode to frontend:

  • "auto-create/alter-on-insertion" is a protocol-related feature that generally used in opentsdb and prometheus, so it's natural to move this feature to frontend.
  • More importantly, in distributed mode, the only place where "auto-create/alter-on-insertion" can happen is frontend. So in order to keep standalone mode and distributed mode behavior consistent, this feature should be moved to frontend.

In order to accommodate this, frontend must have a catalog manager field to check if table exists/table schema matches.

  • In standalone mode, datanode and frontend share one single LocalCatalogManager.
  • In distributed mode, frontend and datanode are in different process, their catalog managers are both based on metasrv.

Which looks like: image

v0y4g3r avatar Nov 12 '22 10:11 v0y4g3r

In #455, we built a set of handle_xxx API for protocols, protocol handlers should not be aware of gRPC client directly (like instance.admin().instance.dataabse()), instead, they convert requests from different protocols to InsertExpr/Select/CreateExpr/CreateDatabaseExpr... and call these handle_xxx APIs.

This handle_xxx API set can be extracted to a trait, maybe FrontendRequestHandler? And by then we can provides two FrontendRequestHandler implementations based on method invocation and gRPC respectively. Thoss XxxxExpr serves as an intermediate representation of all requests. Currently we only have one implmentation that based on gRPC, with a loopback address as the datanode address in standalone mode, this may introduce serialization overhead.

Another issue is the intermediate representation. As discussed before, XxxxExpr is rather primitive now, especially InsertExpr, which directly serializes data points to bytes. If we use InsertExpr as the intermediate representation, it will bring double serialization/deserialization problem.

image

We need either reimplement the InsertExpr or use another representation for all insert request in frontend.

v0y4g3r avatar Nov 14 '22 03:11 v0y4g3r