greptimedb
greptimedb copied to clipboard
Refactor frontend and datanode functionalities
After some discussion, we came to a concensus that all protocol-related logic should be migrated to frontend, including SQL, InfluxDB line protocol, etc. Frontend communicates with datanode thru our gRPC protocol.
As a result, the "standalone" mode of GreptimeDB should start both frontend and datanode in one process and use inter-process communication like Unix Domain Socket.
- [ ] Move SQL parser logic from datanode to frontend #455
- [x] CREATE TABLE
- CREATE TABLE protocol is already supported in frontend, but we still need to chain the all process together, including table id allocation, table registration and catalog managerment stuff in distributed mode.
- [x] CREATE DATABASE
- [x] SELECT
- [x] SHOW DATABASES/SHOW TABLES
- [ ] ALTER
- [x] Add column
- [ ] Drop column #466
- [x] CREATE TABLE
- [x] Glue frontend and datanode in one process #471
- [x] Current solution creates&starts frontend from datanode as a service, which seems wierd. We should further refactor these logic to
cmd
package, and provide 3 subcommand:--standalone
/--frontend
/--datanode
#455
- [x] Current solution creates&starts frontend from datanode as a service, which seems wierd. We should further refactor these logic to
+1 for this. This resolves #399
Using UDS still requires serialization and deserialization between two parts, which may affect our performance benchmark. We need to setup a framework to support both local and remote invocation for these gRPC functionalities.
+1 for this. This resolves #399
Using UDS still requires serialization and deserialization between two parts, which may affect our performance benchmark. We need to setup a framework to support both local and remote invocation for these gRPC functionalities.
We first separate frontend and datanode to make their own responsibilities more clear, then try to find the necessary APIs these two component need to work together, build an abstraction for these APIs and provide implementations based on method invocation and RPC for standalone mode and distributed mode respectively.
Currently we cannot remove SQL parser logic from datanode completely, since some gRPC services directly pass SQL as a string from frontend to datanode. https://github.com/GreptimeTeam/greptimedb/blob/0d4c191a06418b046ecae9337a173c59ad45a43f/src/datanode/src/instance/grpc.rs#L192
If we want to remove sqlparser from datanode and let frontend to parse SQL, and interact with datanode via an intermediate representation instead of SQL string, we need to build a Protobuf definition for sqlparser-rs's query struct.
But I don't think this is our first priority.
Thoughts on migrating the "auto-create/alter-on-insertion" feature from datanode to frontend:
- "auto-create/alter-on-insertion" is a protocol-related feature that generally used in opentsdb and prometheus, so it's natural to move this feature to frontend.
- More importantly, in distributed mode, the only place where "auto-create/alter-on-insertion" can happen is frontend. So in order to keep standalone mode and distributed mode behavior consistent, this feature should be moved to frontend.
In order to accommodate this, frontend must have a catalog manager field to check if table exists/table schema matches.
- In standalone mode, datanode and frontend share one single
LocalCatalogManager
. - In distributed mode, frontend and datanode are in different process, their catalog managers are both based on metasrv.
Which looks like:
In #455, we built a set of handle_xxx
API for protocols, protocol handlers should not be aware of gRPC client directly (like instance.admin()
.instance.dataabse()
), instead, they convert requests from different protocols to InsertExpr
/Select
/CreateExpr
/CreateDatabaseExpr
... and call these handle_xxx
APIs.
This handle_xxx
API set can be extracted to a trait, maybe FrontendRequestHandler
? And by then we can provides two FrontendRequestHandler
implementations based on method invocation and gRPC respectively. Thoss XxxxExpr
serves as an intermediate representation of all requests. Currently we only have one implmentation that based on gRPC, with a loopback address as the datanode address in standalone mode, this may introduce serialization overhead.
Another issue is the intermediate representation. As discussed before, XxxxExpr
is rather primitive now, especially InsertExpr
, which directly serializes data points to bytes. If we use InsertExpr
as the intermediate representation, it will bring double serialization/deserialization problem.
We need either reimplement the InsertExpr
or use another representation for all insert request in frontend.