matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: CDC keep sending all old data as update without newly insert data.

Open cpegeric opened this issue 7 months ago • 2 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

Branch Name

main

Commit ID

27590cf

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

something CDC only give me all UPDATE (old data) without new INSERT data and keep looping.

and no error found from log. except "wait too long" and "unexpect watermark".

Image Image

Expected Behavior

No response

Steps to Reproduce

use the repo

https://github.com/cpegeric/matrixone/tree/cdc_sqlexecutor_cleanup

with branch cdc_sqlexecutor_cleanup

download the tool from 

git clone [email protected]:cpegeric/wiki-benchmark.git

In MO,

> create database eric;

From command line,

% cd wiki-benchmark/python
% python indextest.py buildcdc 127.0.0.1 eric src hnswidx vector_l2_ops 128 1000000 hnsw


IN MO,

select count(*) from src;

LOG,

check the logs/stderr-xxx to see the logs.

Additional information

No response

cpegeric avatar Jun 10 '25 10:06 cpegeric

clean start is always working. However, when using drop cdc, drop pitr and create cdc, create pitr. Issue mostly happens.

cpegeric avatar Jun 10 '25 11:06 cpegeric

watermark每隔1s持久化,重启的时候会从上次持久化的watermark开始读,就会有重复的update。 第一批数据(i.e. cdc任务创建前的数据)的时候,watermark一直是0,重启后会重新读所有数据。 不重启的情况下发送重复数据还没复现。

jiangxinmeng1 avatar Jun 13 '25 01:06 jiangxinmeng1