greptimedb
greptimedb copied to clipboard
Table engine supports multi-region
What type of enhancement is this?
Tech debt reduction
What does the enhancement do?
Currently in GreptimeDB, only one region can be created for a table on one datanode, which will cause data corruption when the number of table partitons exceeds the number of datanodes.
In fact, table engine supports creating multiple regions for one table, but we have to go through all those components on datanodes whether they can accommodate multiple regions.
Implementation challenges
To support multiple regions on datanode, both standalone and distributed mode should be considered.
In distributed mode, requests are parsed on frontend, and frontend forward queries to datanode. Thus, datanode's incoming requests must carry region info, for example, the region to insert, regions to query from etc. It should be relatively easy to add this parameters.
While in standalone mode, things get a bit tricky. If datanode has only one region, then query engine is not aware of partition rules. But if datanode has multiple regions for one table, then query engine should associate requests with corresponding regions according to partition rules. There are two problems here:
- In standalone mode, catalog manager does not support storing partition rules, we must add an
options
field to SystemCatalogTable https://github.com/GreptimeTeam/greptimedb/blob/8402ef94af0043351cceecfa7620e0d8fe320be6/src/catalog/src/system.rs#L49 - Find regions for queries according to partition rules. Currently it's implemented in
DistTable
. But in standalone mode, requests are directly forwarded to datanode. Thus we need to extract the partition logic to some separate struct to reused it in both distributed mode and standalone mode. https://github.com/GreptimeTeam/greptimedb/blob/8402ef94af0043351cceecfa7620e0d8fe320be6/src/frontend/src/table.rs#L85-L95
MitoTable
should be modified to accommodate multiple regions, and it's create
/open
/insert
/select
methods must explicitly provide regions.
Currently all partition rules are parsed in frontend and datanode does not validate if data inserted satisfies partition rule, but we may add this validation once repartition is supported.
As for table scan, current Table::scan
method does not provide a region argument:
https://github.com/GreptimeTeam/greptimedb/blob/c144a1b20ea775af1b174beec4e3dcbfcbb277f6/src/table/src/table.rs#L55-L65
Maybe we should add an regions: Option<Vec<RegionNumber>>
argument.
Maybe we should add an
regions: Option<Vec<RegionNumber>>
argument.
Vec<RegionNumber>
should be adequate as None and an empty vector should represent the same thing.
Maybe we should add an
regions: Option<Vec<RegionNumber>>
argument.
Vec<RegionNumber>
should be adequate as None and an empty vector should represent the same thing.
Also, if there're regions with different schema in a table, then what will be the schema for record batches yielded? Can we compare the versions of schemas of all regions and find the schema with the min version?
Can we compare the versions of schemas of all regions and find the schema with the min version
This might be the most applicable way before #275 is done. BTW, this reminds me that we need to move the version field somewhere else, as mentioned in #349
But you may need to handle the projection
field carefully and solve the difference of schema between table and region. What about returning an error if regions have different schemas in the first PR and leaving some work to the next PR?
But you may need to handle the
projection
field carefully and solve the difference of schema between table and region. What about returning an error if regions have different schemas in the first PR and leaving some work to the next PR?
Projection is an issue. Looks better to just throw an error for now.
Looks like region failover(#1126) also requires multi region support.
Close this issue as version 0.4 already supports multi-region.