TDengine icon indicating copy to clipboard operation
TDengine copied to clipboard

(v3.2.3.0) Adding "limit 10" to a partitioned query reduces performance by 1,000x and returns DB error

Open maxianzhen opened this issue 1 year ago • 6 comments

Performance Issue Any questions related to TDengine's performance can be discussed here.

Problem Description 对meters超级表进行数据切分查询仅0.145秒就返回10000条记录,但是对查询语句加limit 10后 耗时147秒后返回DB error 性能下降幅度147s/0.145 ~= 1013 个人猜测增加limit 10条件后是不是尝试拉取全部数据本地缓存后再做limit过滤?

To Reproduce Steps to reproduce the behavior:

  1. 腾讯云轻量级服务器安装TDengine-server-3.2.3.0-Linux-x64.rpm包
  2. 使用taosBenchmark生成meters表及数据(1亿条)
  3. 参考官方文档“数据切分查询”章节示例 https://docs.taosdata.com/taos-sql/distinguished/
  4. 执行如下SQL: select _wstart, location, max(current) from meters partition by location interval(10m),用时0.145秒,默认返回10000条记录
  5. 但是当我试图limit 10条时 : select _wstart, location, max(current) from meters partition by location interval(10m) limit 10; 查询耗时达145秒,而且返回DB error: No space left on device (147.183304s)的报错 df -h确认过,可用空间还是有富余的

Screenshots image

Environment (please complete the following information):

  • OS: [CentOS 7.0]
  • Memory, CPU, current Disk Space: 2G,2core,50G
  • TDengine Version [3.2.3.0]

maxianzhen avatar Mar 29 '24 06:03 maxianzhen

image image

maxianzhen avatar Mar 29 '24 07:03 maxianzhen

因为是初次体验,可能是使用上或某些配置上不当的原因,发现加上limit 查询时也会导致taosd进程消失。

taosdlog.0: 03/29 21:42:11.698439 00004930 UTL default gitinfo e27fdcff254b7bd0e0ad2f825e0414da4c0f37dc 03/29 21:42:11.698441 00004930 UTL default buildinfo Built Linux-x64 at 2024-02-29 18:01:09 +0800 03/29 21:42:11.698443 00004930 UTL ================================================================= 03/29 21:42:11.701717 00004930 DND start to init dnode env 03/29 21:42:11.702057 00004930 DND succceed to read dnode file /var/lib/taos//dnode/dnode.json 03/29 21:42:11.702068 00004930 DND ERROR dnode:1, localEp bigtide:6030 different from localhost:6030 03/29 21:42:11.702072 00004930 DND ERROR localEp bigtide:6030 different with /var/lib/taos//dnode/dnode.json and need to be reconfigured 03/29 21:42:11.702161 00004930 DND ERROR failed to read file since Invalid config option 03/29 21:42:11.702172 00004930 DND ERROR failed to create dnode since Invalid config option 03/29 21:42:11.702174 00004930 DND ERROR failed to init dnode since Invalid config option

taoslog0.0: 03/29 21:46:05.634183 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:06.635373 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:06.635466 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:07.636658 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:07.636752 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:08.637841 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:08.637936 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:09.639162 00005871 RPC ERROR TSC msg connect failed to send, conn 0x7fb3c0000930 failed to connect to bigtide:6030, reason: connection refused, gtid:0xf9b6e575cb390001:0x7be528a75cb00002 03/29 21:46:09.639340 00005861 TSC ERROR failed to connect to server, reason: Unable to establish connection 03/29 21:46:09.839650 00005861 QRY start to destroy catalog 03/29 21:46:09.839731 00005865 QRY catalog update thread stopped 03/29 21:46:09.839767 00005861 QRY catalog destroyed 03/29 21:46:10.040603 00005861 TSC all local resources released image

maxianzhen avatar Mar 29 '24 13:03 maxianzhen

查看文档有explain命令,生成添加”limit 10”前后的执行计划进行对比后发现:添加limitq前后的执行计划差异明显 image

maxianzhen avatar Mar 30 '24 13:03 maxianzhen

大略通读3.0官方文档,发现有几个点是TDengine特色的。 首先,limit是作用在子表,并不是限制最终结果集。而我在示例sql上增加limit 10,原是预期按传统数据库的方式,在回显的返回结果集上只显示符合条件的10条记录。 其次,有个SMA的预计算概念,在数据文件的头部信息会记录数据库的min/max/sum的信息。

有个疑问: select _wstart, location, max(current) from meters partition by location interval(10m) limit 10; 当在示例sql上加了limit之后,因为limit是作用到子表上限制为10条,那会不会导致这个数据块的头部信息的SMA记录在limit查询中不起到作用哪?(新人刚刚接触TDengine,个人猜测)

maxianzhen avatar Mar 31 '24 11:03 maxianzhen

您好,可以加我们微信a15652223354

我们给您具体排查一下吧。

yu285 avatar Apr 17 '24 03:04 yu285

您好,可以加我们微信a15652223354

我们给您具体排查一下吧。

谢谢!不需要了。我是因为恰巧月初投了一份简历就下载了单机版部署了解了解tdengine。从当时看到的信息来分析,应该是单机模式的limit算子下推的原因,可能部分逻辑在走集群模式,而且单机模式partition没有参与执行计划(或已内部优化)

maxianzhen avatar Apr 17 '24 06:04 maxianzhen