canal 关联查询1对多同步到es，子表使用group的效率问题

关联查询1对多同步到es，子表使用group的效率问题

Open gzjackyguan opened this issue 1 year ago • 1 comments

官方配置的说明如下： select a.id as _id, a.name, a.role_id, c.labels, a.c_time from user a left join (select user_id, group_concat(label order by id desc separator ';') as labels from label group by user_id) c on c.user_id=a.id

这里的从表为user_id索引，但是子查询使用group，同时user_id也是作为联表的条件，根据explain返回的分析结果，子查询实际上是用上了全量遍历的。因为执行子查询在前，聚合查询执行的时候没有user_id条件做为约束的。

我的实测是：我的从表记录数为90万左右，主表数据为300万左右。仅查询从表耗时0.33s，整个查询执行是1.7s。

我接着优化查询为： select a.id as _id, a.name, a.role_id, c.labels, a.c_time, my_group_cat(c.user_id) as labels, -- 自定义的一个函数，就是实现 group_concat(label order by id desc separator ';') 这样的功能 count(1) as sub_count -- 这个是无意义的，仅仅是为了多个从表记录的时候实现一条输出 from user a left join label on c.user_id=a.id 这样实测效率就高很多，只有0.003秒，符合同步要求的速度。

但是问题来了，canal adapter 对此查询不报错，但是就无法执行提交。我认真看了各种编写约束都是符合的，且本身也是没有报错。只是过了相当长时间后就报2023-12-29 22:33:00.010 [Thread-3] ERROR c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - com.alibaba.otter.canal.protocol.exception.CanalClientException: java.io.IOException: end of stream when reading header Error sync but ACK!

我在主查询中使用不同的聚合函数都不行。其实我本意只是实现一对多的联表查询，看有无优化的方法。

Jan 06 '24 03:01 gzjackyguan

遇到同样的问题，1对多关联的时候，使用官方提供的sql非常慢

Jan 23 '24 09:01 zhaofei1193

canal canal copied to clipboard

关联查询1对多同步到es，子表使用group的效率问题

canal
canal copied to clipboard