DiffData found "Missing target row found for key" while data is ok
Hi,
After a CDM + ZDM migration run, I use CDM in DiffData (Validation) mode (ZDM is still running).
It report a few ERROR DiffJobSession: Missing target row found for key: [325081341794 %% 2025-10-16T13:34:02.492Z]
When I query origin and target of this specific key, they are the same, the writetime is the same, so it seems that everything is ok.
But it seems that the data is updated at the precise moment where DiffData run (the writime is the same second than the "Missing target row" log).
I don't really know how the "DiffData" works, but does it get a "batch" of data from origin, get from target, and compare ? So if any write happens at the same time, we can have discrepancies ?
Thank you;
Ok, it seems to use the "FetchSizeRows"
# .fetchSizeInRows : Default is 1000. This affects the frequency of reads from Origin, and also the
# frequency of flushes to Target. A larger value will reduce the number of reads
# and writes, but will increase the memory requirements.
So I guess that with a high write/update workload, we can have false positive during validation?
- What version of CDM did you use here?
- What is the exact and full command-line and console output?
- Could you get the full table schema?
- Could you post the cdm.properties file used (ofc, creds sanitized)
- Could you upload the cdm generated log file?
Hi,
1/ CDM 5.6.0
2/ ./spark-3.5.6-bin-hadoop3-scala2.13/bin/spark-submit -v --properties-file confs/cdm.properties --master local[*] --driver-memory 12G --executor-memory 12G --class com.datastax.cdm.job.DiffData cassandra-data-migrator-5.6.0.jar
3/
CREATE TABLE keyspace.table (
doc_id ascii,
date timestamp,
metrics ascii,
PRIMARY KEY (doc_id, date)
) WITH CLUSTERING ORDER BY (date DESC)
4/ Same conf than the "migrate" step, but with lower ratelimit:
spark.cdm.connect.origin.host host1,host2
spark.cdm.connect.origin.port 9042
spark.cdm.connect.target.host newhost1,newhost2
spark.cdm.connect.target.port 9042
spark.cdm.schema.origin.keyspaceTable keyspace.table
spark.cdm.schema.target.keyspaceTable keyspace.table
spark.cdm.autocorrect.missing false
spark.cdm.autocorrect.mismatch false
spark.cdm.trackRun true
spark.cdm.perfops.numParts 50000
spark.cdm.perfops.batchSize 5
spark.cdm.perfops.ratelimit.origin 10000
spark.cdm.perfops.ratelimit.target 10000
spark.cdm.filter.cassandra.partition.min -9223372036854775808
spark.cdm.filter.cassandra.partition.max 0
5/ Do you need the full log ? I need to remove some informations. Extract:
25/10/16 17:01:51 INFO DiffJobSession: ThreadID: 108 Processing min: -1268029187626773712 max: -1267844720186036617
25/10/16 17:01:51 INFO Executor: Finished task 15.0 in stage 0.0 (TID 15). 2064 bytes result sent to driver
25/10/16 17:01:51 INFO TaskSetManager: Starting task 24.0 in stage 0.0 (TID 24) (migrator01, executor driver, partition 24, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:51 INFO Executor: Running task 24.0 in stage 0.0 (TID 24)
25/10/16 17:01:51 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 1260630 ms on migrator01 (executor driver) (9/49529)
25/10/16 17:01:51 INFO DiffJobSession: ThreadID: 120 Processing min: -570188859318339544 max: -570004391877602449
25/10/16 17:01:51 INFO Executor: Finished task 12.0 in stage 0.0 (TID 12). 2064 bytes result sent to driver
25/10/16 17:01:51 INFO TaskSetManager: Starting task 25.0 in stage 0.0 (TID 25) (migrator01, executor driver, partition 25, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:51 INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 1261015 ms on migrator01 (executor driver) (10/49529)
25/10/16 17:01:51 INFO Executor: Running task 25.0 in stage 0.0 (TID 25)
25/10/16 17:01:51 INFO DiffJobSession: ThreadID: 117 Processing min: -3600804443188089728 max: -3600619975747352633
25/10/16 17:01:51 INFO Executor: Finished task 13.0 in stage 0.0 (TID 13). 2064 bytes result sent to driver
25/10/16 17:01:51 INFO TaskSetManager: Starting task 26.0 in stage 0.0 (TID 26) (migrator01, executor driver, partition 26, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:51 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 1261020 ms on migrator01 (executor driver) (11/49529)
25/10/16 17:01:51 INFO Executor: Running task 26.0 in stage 0.0 (TID 26)
25/10/16 17:01:51 INFO DiffJobSession: ThreadID: 118 Processing min: -771627304603248376 max: -771442837162511281
25/10/16 17:01:52 INFO Executor: Finished task 11.0 in stage 0.0 (TID 11). 2064 bytes result sent to driver
25/10/16 17:01:52 INFO TaskSetManager: Starting task 27.0 in stage 0.0 (TID 27) (migrator01, executor driver, partition 27, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:52 INFO Executor: Running task 27.0 in stage 0.0 (TID 27)
25/10/16 17:01:52 INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 1261609 ms on migrator01 (executor driver) (12/49529)
25/10/16 17:01:52 INFO DiffJobSession: ThreadID: 116 Processing min: -6315058366193720272 max: -6314873898752983177
25/10/16 17:01:53 INFO Executor: Finished task 10.0 in stage 0.0 (TID 10). 2064 bytes result sent to driver
25/10/16 17:01:53 INFO TaskSetManager: Starting task 28.0 in stage 0.0 (TID 28) (migrator01, executor driver, partition 28, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:53 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 1262582 ms on migrator01 (executor driver) (13/49529)
25/10/16 17:01:53 INFO Executor: Running task 28.0 in stage 0.0 (TID 28)
25/10/16 17:01:53 INFO DiffJobSession: ThreadID: 115 Processing min: -2697467385898530616 max: -2697282918457793521
25/10/16 17:01:53 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2064 bytes result sent to driver
25/10/16 17:01:53 INFO TaskSetManager: Starting task 29.0 in stage 0.0 (TID 29) (migrator01, executor driver, partition 29, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:53 INFO Executor: Running task 29.0 in stage 0.0 (TID 29)
25/10/16 17:01:53 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 1262728 ms on migrator01 (executor driver) (14/49529)
25/10/16 17:01:53 INFO DiffJobSession: ThreadID: 107 Processing min: -4782687335990663800 max: -4782502868549926705
25/10/16 17:01:54 INFO Executor: Finished task 14.0 in stage 0.0 (TID 14). 2064 bytes result sent to driver
25/10/16 17:01:54 INFO TaskSetManager: Starting task 30.0 in stage 0.0 (TID 30) (migrator01, executor driver, partition 30, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:54 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 1263865 ms on migrator01 (executor driver) (15/49529)
25/10/16 17:01:54 INFO Executor: Running task 30.0 in stage 0.0 (TID 30)
25/10/16 17:01:54 INFO DiffJobSession: ThreadID: 119 Processing min: -8769397665200782552 max: -8769213197760045457
25/10/16 17:01:58 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 2064 bytes result sent to driver
25/10/16 17:01:58 INFO TaskSetManager: Starting task 31.0 in stage 0.0 (TID 31) (migrator01, executor driver, partition 31, PROCESS_LOCAL, 10599 bytes)
25/10/16 17:01:58 INFO Executor: Running task 31.0 in stage 0.0 (TID 31)
25/10/16 17:01:58 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 1268250 ms on migrator01 (executor driver) (16/49529)
25/10/16 17:01:58 INFO DiffJobSession: ThreadID: 112 Processing min: -4373169617554310680 max: -4372985150113573585
25/10/16 17:03:59 ERROR DiffJobSession: Missing target row found for key: [325138987663 %% 2025-10-16T14:54:46.299Z]
The data (identical on origin and target when I query) Edit/Note: The data can be inserted, updated or deleted by our workflow/backend
cqlsh> select doc_id, date, metrics, writetime(metrics) from keyspace.table where doc_id = '325138987663';
doc_id | date | metrics | writetime(metrics)
--------------+---------------------------------+----------------------------------------------------------------------------------------+--------------------
325138987663 | 2025-10-16 15:03:39.311000+0000 | {"provider":{"interactions":{"score":74,"updated_at":"2025-10-16T15:03:39.311084072Z"}}} | 1760627027637457
The field "writetime" is ~12 seconds before the "DiffJobSession" log error, sometime less, sometime more, but always in the same ~minute. Thank you,
This could be a consistency issue with the target cluster. Can you please validate the target cluster is not having any consistency issues. Can you also verify if CDM migration and then validation both were run with spark.cdm.perfops.consistency.read & spark.cdm.perfops.consistency.write set to local_quorum
Hi, thank you.
I'm using the default config for these settings, which is local_quorum :)
But maybe the problem come from here: 2 DC, with updates, so DC 1 get the update from our application, while data is still replicating to DC2, CDM DiffData query to DC 2 (at the ~same time)?
Hi @Skunnyk
Writetime in Cassandra is typically set at the coordinator node at the time the write is first received. When the data is replicated to the remote DC, the original writetime is preserved and used for the replicated record. So even though you see the writetime on the record is a few seconds before the error was reported, it could very well have arrived a bit later.
So, I believe the issue could be due to a slight delay in DC replication. If the missing records reported by CDM are always for hot-data (i.e. data that was just recently written) & never with cold-data (data existing in the cluster before CDM job was initiated), then most likely its due to a replication delay.
Let us know if you were able to find anything more about this issue.
Hi,
Thank you for your time. Yes, it makes sense, as a manual query a few seconds later show no more discrepancies.