iotdb
iotdb copied to clipboard
[IOTDB-28] Calcite Integration for IoTDB
IoTDB-Calcite Adapter.
IoTDB - Calcite Adapter 功能文档
关系表结构
IoTDB - Calcite Adapter 中使用的关系表结构为:
| time | device | sensor1 | sensor2 | sensor3 | ... |
|---|---|---|---|---|---|
其中,IoTDB 中每个存储组作为一张表,表中的列包括 time , device 列以及该存储组中所有设备中传感器的最大并集,其中不同设备的同名传感器应该具有相同的数据类型。
例如对于 IoTDB 中存储组 root.sg,其中设备及其对应的传感器为:
- d1 -> s1, s2
- d2 -> s2, s3
- d3 -> s1, s4
则在 IoTDB - Calcite Adapter 中的表名为 root.sg,其表结构为
| time | device | s1 | s2 | s3 | s4 |
|---|---|---|---|---|---|
工作原理
接下来简单介绍 IoTDB - Calcite Adapter 的工作原理。
输入的 SQL 语句在经过 Calicte 的解析验证后,对 IoTDBRules 中定义的优化(下推)规则进行匹配,对于能够下推的节点做相应转化后,得到能够在 IoTDB 端执行的 SQL 语句,然后在 IoTDB 端执行查询语句获取源数据;对于不能下推的节点则调用 Calcite 默认的物理计划进行执行,最后通过 IoTDBEnumerator 遍历结果集获取结果。
查询介绍
当前在 IoTDBRules 中定义的下推规则有:IoTDBProjectRule, IoTDBFilterRule, IoTDBLimitRule。
IoTDBProjectRule
IoTDBProjectRule 实现了将查询语句中出现的投影列下推到 IoTDB 端进行执行。
例如:(以下 sql 均为测试中的语句)
- 对于通配符
select * from "root.vehicle"
对于通配符 *,将在转化中保持原样,而不转化为列名,得到 IoTDB 中的查询语句为:
select * from root.vehicle.* align by device
- 对于非通配符的传感器列
select s0 from "root.vehicle"
将转化为:
select s0 from root.vehicle.* align by device
- 对于非通配符的非传感器列
select "time", device, s2 from "root.vehicle"
该语句中的 time 及 device 列是 IoTDB 的查询语句中不需要包括的,因此转化将去掉这两列,得到 IoTDB 中的查询语句为:
select s2 from root.vehicle.* align by device
特别地,如果查询语句中仅包含 time 及 device 列,则投影部分将转化为通配符 *。
- 重命名 Alias
当前 IoTDB - Calcite Adapter 仅支持在 SELECT 语句中对投影列进行重命名,不支持在后续语句中使用重命名后的名称。
select "time" AS t, device AS d, s2 from "root.vehicle"
将得到结果中 time 列的名字为 t,device 列的名字为 d。
IoTDBFilterRule
IoTDBFilterRule 实现了将查询语句中的 WHERE 子句下推到 IoTDB 端进行执行。
- WHERE 子句中不限制 device 列
select * from "root.vehicle" where "time" < 10 AND s0 >= 150
对于 time 列将不作改变,由于未限制具体的设备,因此传感器列不会与具体的设备名进行拼接,得到 IoTDB 中的查询语句为:
select * from root.vehicle.* where time < 10 AND s0 >= 150
- WHERE 子句中限制 device 列
- 仅限制单个设备
select * from "root.vehicle" where device = 'root.vehicle.d0' AND "time" > 10 AND s0 <= 100
如果 WHERE 中只限制了单个设备且其它限制条件均是对该设备的限制,则在 IoTDB 中将转化为对该设备的查询,上述查询将转化为:
select * from root.vehicle.d0 where time > 10 AND s0 <= 100
- 限制多个设备
select * from "root.vehicle" where (device = 'root.vehicle.d0' AND "time" <= 1) OR (device = 'root.vehicle.d1' AND s0 < 100)
如果 WHERE 中限制了多个设备,将转化为多条查询语句,根据对每个设备的限制条件分别进行查询。
如上述查询语句将转化为两条 SQL 在 IoTDB 中执行:
select * from root.vehicle.d0 where time <= 1
select * from root.vehicle.d1 where s0 < 100
- 既有限制设备的条件,又有全局条件
select * from "root.vehicle" where (device = 'root.vehicle.d0' AND "time" <= 1) OR s0 = 999
在上述 SQL 语句中,除了有对设备 root.vehicle.d0 的单独限制外,还有一个限制条件 s0 = 999,该限制条件被认为是一个全局条件,任何设备只要满足该条件都被认为是正确结果。
因此上述查询将转化为对存储组中所有设备的查询,对于有单独限制条件的设备将单独处理,其它剩余设备将使用全局条件统一查询。
select * from root.vehicle.d0 where time <= 1 OR s0 = 999
select * from root.vehicle.d1 where s0 = 999
注:由于测试中恰好只有两个设备,如果再有一个设备 d2,则将在 FROM 子句加上 root.vehicle.d2 而非为设备 d2 单独再次查询。
@Alima777 Thanks for your contribution, I will have a review this month.
@Alima777 Thanks for your contribution, I will have a review this month.
Thank you. Wait for your review.
Hi, @yuqi1129 would you like to have a review of this PR? :D
@Test
public void testFilter7() {
CalciteAssert.that()
.with(MODEL)
.with("UnQuotedCasing", IoTDBConstant.UnQuotedCasing)
.query("select * from \"root.vehicle\" " +
"where (device = 'root.vehicle.d0' AND \"time\" <= 1) OR s2 = 2.22 and 1 = 1")
.returns("time=1; device=root.vehicle.d0; s0=101; s1=1101; s2=null; s3=null; s4=null\n" +
"time=2; device=root.vehicle.d0; s0=10000; s1=40000; s2=2.22; s3=null; s4=null\n" +
"time=2222; device=root.vehicle.d1; s0=null; s1=null; s2=2.22; s3=null; s4=null\n")
.explainContains("PLAN=IoTDBToEnumerableConverter\n" +
" IoTDBFilter(condition=[OR(AND(=($1, 'root.vehicle.d0'), <=($0, 1)), =(CAST($4):DOUBLE NOT NULL, 2.22))])\n" +
" IoTDBTableScan(table=[[IoTDBSchema, root.vehicle]])");
}
Should add rules to reduce constant value
like where 1 = 1 and a = 1 should be folded as where a = 1, see above
@yuqi1129 Hi, thank you very much for your patient review!
Since it's the first large module I implemented, ignoring lots of code standards, like: exception processing, error log, standard code style and something else... And It has not been maintained for a long time.
So Thanks again! I've fixed it and please have a check.
hi all, The project was developed based on calcite and avatica. calcite has some problems:
- For simple query optimization, using calcite is too heavy.
- When calcite generates the execution plan tree, some class files are generated, which may cause memory overflow. This is because there are too many fields in the search criteria. For example, slelect * from a in (1, 2, 3, 4.....500), This will cause oom.
@wangchao316 , Hi, about your problem, we may do the following as far as i known
- Skip the optimization stage, that is, we only use the parser to convert sql to ast then validate and transfer the ast to logical RelNode, avatica builtin logical indeed is heavy for simple query, especially for time scales sql, this may need to change the calcite source code and maintain our own version if we want to solve this.
- this problem seems the problem the SQL itself , we do not recommend too many values in in clause, due to the limit about the field in java language, we can't solve it well.
The above is a personal point of view, discussion is wanted