scylla-migrator icon indicating copy to clipboard operation
scylla-migrator copied to clipboard

Migrator wrongly detect a cluster key as empty

Open carlo4002 opened this issue 2 years ago • 6 comments

Hello guys,

I am running the migrator job, and I got an strange error.

The job doesn't finish completely and gives this message

Error while encoding: java.lang.RuntimeException: The 1th field 'userprofiletype' of input row cannot be null.

This field userprofiletype is a cluster key, it is not null.

I was able could extract some missing primary keys and the data looks no corrupted.

I attach the full logs here issue.txt

and here is the table description.

Table

CREATE TABLE keyspace.userprofiles (
useruuid uuid,
userprofiletype text,
about text static,
accountuuid uuid,
active boolean static,
companyname text static,
completedate timestamp static,
currency text,
emailverified boolean static,
employer text static,
expediaid text,
expediaoptin boolean,
firstname text static,
gender text static,
hometown text static,
inserted timestamp static,
languages text static,
lastname text static,
locale text,
maidenname text static,
mediacollectionitemuuid uuid static,
mediacollectionuuid uuid,
membersince timestamp,
middlename text static,
profileuuid uuid static,
propertymgrhaflag boolean static,
propertymgrmemberflag boolean static,
publicuuid uuid static,
school text static,
screenlocation text static,
source_facebook_user_id text static,
suffix text static,
title text static,
updated timestamp static,
updated_account timestamp,
updated_publicprofilepicture timestamp static,
updated_userpublicprofile timestamp static,
updated_users timestamp static,
PRIMARY KEY (useruuid, userprofiletype)
) WITH CLUSTERING ORDER BY (userprofiletype ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

it is difficult for me to check what is the cause of this problem. I appreciate any help

carlo4002 avatar Oct 13 '22 14:10 carlo4002

@carlo4002 what is the Scylla Migrator branch you used to build the project? Also please share the versions of Java, Scala and Spark.

cc @tarzanek @igorribeiroduarte

hopugop avatar Oct 19 '22 20:10 hopugop

yes, there were some null checks added to recent version, we can eventually improve them so knowing which version of migrator is used is crucial (dtto for spark, since current master only works on 2.4, for 3.1 there is a different branch)

tarzanek avatar Oct 20 '22 08:10 tarzanek

Hello @tarzanek , so I used the master branch this commit b69a5908b12b876c0c62039d3dd60e7cb80c8d09.

We are using spark 2.4.4 as it was recommended and 1.8.0_212 (Amazon.com Inc.) Scala version 2.11.12.

Sorry I did not answer before, I am available in Europe time zone

carlo4002 avatar Oct 20 '22 09:10 carlo4002

yesterday I built the migrator with the last commit in master, I will try to run it again with consistency local_quorum

carlo4002 avatar Oct 20 '22 09:10 carlo4002

https://github.com/scylladb/scylla-migrator/blob/master/src/main/scala/com/scylladb/migrator/readers/Cassandra.scala#L132 should validate PKs as well as CKs, I guess we can try to validate

tarzanek avatar Oct 20 '22 10:10 tarzanek

I will try to run it again with consistency local_quorum

@carlo4002 did you get a chance to test? We'd be happy to hear back your results :smile:

hopugop avatar Nov 29 '22 18:11 hopugop

I am closing this issue since we were not able to reproduce it and we had no news for some time. We can re-open it if someone can confirm that it still happen.

julienrf avatar Aug 25 '24 15:08 julienrf