paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] first-row engine Batch Incremental count not equal changelogRecordCount

Open lppsuixn opened this issue 1 year ago • 1 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Paimon version

0.7

Compute Engine

flink

Minimal reproduce step

Between snapshot-12 and snapshot-14 has 408859167 changelog records,But batch Incremental count is 1545129599.

SELECT count(1) FROM my_catalog_paimon.db.table  /*+ OPTIONS('incremental-between' = '12,14') */ limit 10;

image

And scan.file-creation result is correct.

SELECT count(1) FROM my_catalog_paimon.db.table /*+ OPTIONS('scan.file-creation-time-millis' = '1714119900000','scan.infer-parallelism.max'='500') */

企业微信截图_17141412953766

snapshot-12

{
  "version" : 3,
  "id" : 12,
  "schemaId" : 0,
  "baseManifestList" : "manifest-list-a0cd3036-c7a9-4d97-9b3b-4fe117a2806e-2",
  "deltaManifestList" : "manifest-list-a0cd3036-c7a9-4d97-9b3b-4fe117a2806e-3",
  "changelogManifestList" : "manifest-list-a0cd3036-c7a9-4d97-9b3b-4fe117a2806e-4",
  "commitUser" : "bb69057c-5a2b-4c8d-bf88-ebea7b373764",
  "commitIdentifier" : 9223372036854775807,
  "commitKind" : "COMPACT",
  "timeMillis" : 1714115951112,
  "logOffsets" : { },
  "totalRecordCount" : 949445145099,
  "deltaRecordCount" : -4538585,
  "changelogRecordCount" : 42198489518,
  "watermark" : -9223372036854775808
}

snapshot-14

{
  "version" : 3,
  "id" : 14,
  "schemaId" : 0,
  "baseManifestList" : "manifest-list-bf44c64a-80e6-410d-b43c-2d65b8c03823-2",
  "deltaManifestList" : "manifest-list-bf44c64a-80e6-410d-b43c-2d65b8c03823-3",
  "changelogManifestList" : "manifest-list-bf44c64a-80e6-410d-b43c-2d65b8c03823-4",
  "commitUser" : "b457cb4b-2a32-4ca7-be21-f50328894121",
  "commitIdentifier" : 9223372036854775807,
  "commitKind" : "COMPACT",
  "timeMillis" : 1714124210161,
  "logOffsets" : { },
  "totalRecordCount" : 949854004266,
  "deltaRecordCount" : -1136270432,
  "changelogRecordCount" : 408859167,
  "watermark" : -9223372036854775808
}

What doesn't meet your expectations?

Batch Incremental count equal changelogRecordCount

Anything else?

CREATE TABLE my_catalog_paimon.db.table(
a bigint,
b bigint,
c string,
d string,
e string,
f string,
g string,
hstring,
PRIMARY KEY (a,b) NOT ENFORCED ) WITH (
'merge-engine'='first-row',
'changelog-producer' = 'lookup',
'bucket' = '5000'
 );

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

lppsuixn avatar Apr 26 '24 14:04 lppsuixn

You can use incremental-between-scan-mode = changelog to fix this.

I will change the default behavior.

JingsongLi avatar Apr 30 '24 09:04 JingsongLi