gh-ost icon indicating copy to clipboard operation
gh-ost copied to clipboard

getting Column 'message_id' cannot be null - while we have no null values in table.

Open tapuhi opened this issue 7 years ago • 23 comments

Gettign the following error:

2018-02-12 15:35:05 ERROR Error 1048: Column 'message_id' cannot be null; query=
                        replace /* gh-ost `shoutout`.`_messages_gho` */ into
                                `shoutout`.`_messages_gho`
                                        (`message_id`, `instance_id`, `data`, `date_created`, `date_updated`, `short_code`, `status`, `tags`, `date_sent`, `distribution_status`)
                                values
                                        (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

while: select count(1) from messages where message_id is null

    -> ;
+----------+
| count(1) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

tapuhi avatar Feb 12 '18 15:02 tapuhi

What is the table definition of messages and the gh-ost command you're running?

zmoazeni avatar Feb 12 '18 15:02 zmoazeni

command is

gh-ost \
--max-load=Threads_running=500 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--throttle-control-replicas="db-mysql-REDACTED1" \
--max-lag-millis=150000 \
--user="dbschema" \
--password="REDACTED" \
--host=db-mysql-REDACTED2 \
--database="shoutout" \
--table="messages" \
--alter="ADD visibility_status TINYINT,ADD INDEX (visibility_status)" \
--switch-to-rbr \
--allow-master-master \
--cut-over=default \
--exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--allow-on-master \
--initially-drop-ghost-table \
--initially-drop-socket-file \
--quiet \
--stack \
--execute 

messages table def:

CREATE TABLE `messages` (
  `message_id` varchar(50) NOT NULL,
  `instance_id` varchar(50) NOT NULL,
  `data` mediumtext NOT NULL,
  `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `date_updated` bigint(20) NOT NULL,
  `short_code` varchar(20) DEFAULT NULL,
  `status` tinyint(4) NOT NULL DEFAULT '0',
  `tags` varchar(1024) DEFAULT NULL,
  `date_sent` bigint(20) DEFAULT NULL,
  `distribution_status` tinyint(4) DEFAULT NULL,
  PRIMARY KEY (`message_id`),
  UNIQUE KEY `short_code` (`short_code`),
  KEY `instance_status_date_created_ix` (`instance_id`,`date_created`,`status`),
  KEY `distribution_status` (`distribution_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16

tapuhi avatar Feb 12 '18 15:02 tapuhi

Oh this is an interesting one. Perhaps @shlomi-noach has more ideas on it.

I don't know if gh-ost has any issues using a varchar primary key like that.

One thought though is perhaps there's a unicode issue going on where go can't parse the value and it's ending up as null in go-land. Which causes issues with the primary key constraint.

zmoazeni avatar Feb 12 '18 16:02 zmoazeni

The only column where a Unicode character can exist is the data column the message_id is actually uuid.

tapuhi avatar Feb 12 '18 16:02 tapuhi

I don't have an immediate explanation. varchar should work well. @tapuhi can you possibly run with --debug and paste the beginning of the output, up till the part where it's actually copying rows?

(I don't mind if this 2nd run reproduces or not reproduces the problem)

shlomi-noach avatar Feb 12 '18 16:02 shlomi-noach

2018/02/12 16:25:06 binlogsyncer.go:79: ^[[0;37m[info] create BinlogSyncer with config {99999 mysql db-mysql-shoutout0a.42.wixprod.net 3306 dbschema   false false <nil>}^[[0m
2018/02/12 16:25:06 binlogsyncer.go:246: ^[[0;37m[info] begin to sync binlog from position (mysq.001362, 964919904)^[[0m
2018/02/12 16:25:06 binlogsyncer.go:139: ^[[0;37m[info] register slave for master server db-mysql-shoutout0a.42.wixprod.net:3306^[[0m
2018/02/12 16:25:06 binlogsyncer.go:573: ^[[0;37m[info] rotate to (mysq.001362, 964919904)^[[0m
# Migrating `shoutout`.`messages`; Ghost table is `shoutout`.`_messages_gho`
# Migrating db-mysql-shoutout0a:3306; inspecting db-mysql-shoutout0a:3306; executing on db-mysql-shoutout0a
# Migration started at Mon Feb 12 16:25:05 +0000 2018
# chunk-size: 1000; max-lag-millis: 150000ms; dml-batch-size: 10; max-load: Threads_running=500; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 1
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag [set]
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.shoutout.messages.sock
Copy: 0/8373600 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 0s(copy); streamer: mysq.001362:965017356; State: migrating; ETA: N/A
2018-02-12 16:25:07 ERROR Error 1048: Column 'message_id' cannot be null; query=
                        replace /* gh-ost `shoutout`.`_messages_gho` */ into
                                `shoutout`.`_messages_gho`
                                        (`message_id`, `instance_id`, `data`, `date_created`, `date_updated`, `short_code`, `status`, `tags`, `date_sent`, `distribution_status`)
                                values
                                        (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

tapuhi avatar Feb 12 '18 16:02 tapuhi

I'm guessing this is either a data issue or a version issue of either MySQL or an older binary of gh-ost.

Using Percona 5.7 and the latest release of gh-ost, it works fine with contrived data:

create database shoutouts;
use shoutouts;
create table `messages` (
  `message_id` varchar(50) NOT NULL,
  `instance_id` varchar(50) NOT NULL,
  `data` mediumtext NOT NULL,
  `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `date_updated` bigint(20) NOT NULL,
  `short_code` varchar(20) DEFAULT NULL,
  `status` tinyint(4) NOT NULL DEFAULT '0',
  `tags` varchar(1024) DEFAULT NULL,
  `date_sent` bigint(20) DEFAULT NULL,
  `distribution_status` tinyint(4) DEFAULT NULL,
  PRIMARY KEY (`message_id`),
  UNIQUE KEY `short_code` (`short_code`),
  KEY `instance_status_date_created_ix` (`instance_id`,`date_created`,`status`),
  KEY `distribution_status` (`distribution_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

insert into messages (message_id, instance_id, data, date_updated) values ('message_id1', 'instance_id', 'data', 1);
insert into messages (message_id, instance_id, data, date_updated) values ('message_id2', 'instance_id', 'data', 2);
insert into messages (message_id, instance_id, data, date_updated) values ('message_id3', 'instance_id', 'data', 3);
~/Downloads/gh-ost-binary-linux-20180211093207/gh-ost \
--initially-drop-ghost-table \
--initially-drop-old-table \
--assume-rbr \
--verbose \
--exact-rowcount \
--port="33307" \
--assume-master-host="127.0.0.1:3306" \
--user="root" \
--password="" \
--database="shoutouts" \
--table="messages" \
--alter="ADD visibility_status TINYINT,ADD INDEX (visibility_status)" \
--execute 2>&1 | tee /tmp/ghost-output-$(date -uIs).txt

One thing that did catch my eye on your command was:

--switch-to-rbr \
--allow-master-master \
--allow-on-master \

This bit is beyond me, but I don't know about switching to RBR in a master-master setup like that. I would shy away from that myself and instead setup the environment for gh-ost to succeed without changing the replication settings on master during the run.

Can you perhaps try manually bisecting the table with mysqldump or on a replica with insert into clone_table select from message ... to see if you can find a 1 or more rows causing the same constraint issue? (Or pass/fail gh-osting with the subset of data)

zmoazeni avatar Feb 12 '18 16:02 zmoazeni

@tapuhi can you please resubmit the log? It seems corrupted, and does not include the interesting parts.

shlomi-noach avatar Feb 12 '18 16:02 shlomi-noach

@tapuhi What version of mysql are you using?

If the error on the replace statement, then the query is the one generated by gh-ost from parsing the binary logs.

As @zmoazeni pointed out, the combination of flags is interesting.

--switch-to-rbr 
--host=db-mysql-REDACTED2 
--allow-master-master 
--allow-on-master 

(Perhaps off-topic) @shlomi-noach does that work as the correct combination of settings? --switch-to-rbr assumes you're using a replica, and you're setting global binlog_format to ROW and restarting replication; if this were to run on a master that was STATEMENT/MIXED, there could still be connections writing SQL to the binary logs, though presumably gh-ost would fail if that happened. Should --switch-to-rbr and --allow-on-master be allowed together?

ggunson avatar Feb 14 '18 06:02 ggunson

Should --switch-to-rbr and --allow-on-master be allowed together?

gh-ost is fine with the combination, but will refuse to switch to RBR if the server has replicas. It seems to me like the combination above is just some copy+paste @tapuhi did. I do believe, from his presentation at PerconaLive dublin, he's running a master-master setup.

@tapuhi are you running STATEMENT or ROW at this time?

shlomi-noach avatar Feb 14 '18 06:02 shlomi-noach

Hi, sorry for the delayed answer i was not at work yesterday. Actually, it's not a copy past I did, I was playing with flags and forgot to remove the irrelevant ones :-) .

Yes we indeed have a master-master structure in Wix. And I've moved to ROW prior the gh-ost run .

@shlomi-noach Which log r u refering to, I am redirecting the stderr error and stdoutput to a log file( > logfile 2>&1 ) , this one ?

Regarding versions , I am using mysql - 5.7.19 and gh-ost 1.0.44.

tapuhi avatar Feb 14 '18 07:02 tapuhi

@zmoazeni Good idea, I will try to dump the table and see if it happens, the annoying part is that the table is 220G and dump will take a long time ;-)

tapuhi avatar Feb 14 '18 08:02 tapuhi

Which log r u refering to, I am redirecting the stderr error and stdoutput to a log file( > logfile 2>&1 ) , this one ?

yep.

shlomi-noach avatar Feb 14 '18 08:02 shlomi-noach

gh-ost.log.gz

here you go

tapuhi avatar Feb 14 '18 08:02 tapuhi

@tapuhi again, I need the log from the beginning.

shlomi-noach avatar Feb 14 '18 08:02 shlomi-noach

It's all the log , nothing was printed to screen that is not in file.

tapuhi avatar Feb 14 '18 09:02 tapuhi

My bad, I was running it with "quiet" mode :-) Re-running with debug and verbose on

gh-ost \
--max-load=Threads_running=500 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--throttle-control-replicas="db-mysql-shoutout0a.96.wixprod.net" \
--max-lag-millis=150000 \
--user="dbschema" \
--password="******" \
--host=db-mysql-shoutout0a.42.wixprod.net \
--database="shoutout" \
--table="messages" \
--alter="ADD visibility_status TINYINT,ADD INDEX (visibility_status)" \
--allow-master-master \
--cut-over=default \
--exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--allow-on-master \
--initially-drop-ghost-table \
--initially-drop-socket-file \
--stack \
--debug \
--verbose \
--execute  >  /tmp/gh-ost.log 2>&1

tapuhi avatar Feb 14 '18 10:02 tapuhi

gh-ost.log.gz

tapuhi avatar Feb 14 '18 10:02 tapuhi

Good idea, I will try to dump the table and see if it happens, the annoying part is that the table is 220G and dump will take a long time ;-)

@tapuhi yeah that can be a pain. mysqldump does have filtering available so you may want to try exporting chunks at a time. That may be easier to narrow down to interesting rows. I'm thinking of how git-bisect works with a process of elimination.

zmoazeni avatar Feb 14 '18 12:02 zmoazeni

I'm a bit lost and am hoping @zmoazeni is on the right path.

Otherwise, I'd like to verify what max_allowed_packet is, as I see the rows are pretty long.

@tapuhi I'd also attempt to go lower on dml_batch_size, go down to 1 -- but this is a hunch, nothing more.

shlomi-noach avatar Feb 14 '18 14:02 shlomi-noach

@tapuhi it may be that mysql parameter is set binlog_row_image=minimal, you should set binlog_row_image=full

rudy-gao avatar Mar 06 '18 08:03 rudy-gao

@tapuhi were you ever able to find the source of this issue?

aeb47 avatar Nov 12 '18 17:11 aeb47

@shlomi-noach I also facing same kind of issue.

I am using gh-ost version 1.0.48 mysql version is 5.7.34 binlog_row_image is FULL I am using below command- ./gh-ost --max-load=threads_connected=200,threads_running=25 --chunk-size=500 --allow-on-master --throttle-control-replicas="10.1.1.1:3306,10.1.1.2:3306,10.1.1.3:3306" --max-lag-millis=2000 --port=3306 --user=ghostusr --password=********* --host=... --database=warehouse --table=batch --verbose --alter="ADD INDEX ix_settletime_type(settletime,type)" --cut-over=default --default-retries=120 --nice-ratio=0 --throttle-flag-file=/tmp/gh-ost.throttle --panic-flag-file=/tmp/ghost.panic.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --execute 1> /tmp/ghost.out 2>/tmp/ghost.err

As per ghost.err file (also updating complete output below), complete data was copied at "2021-09-02 08:26:26 INFO Row copy complete" successfully and same time ghost.out file generated message "State: postponing cut-over; ETA: due" means everything was fine for cut-over but as I tried to verify ghost.err file before taking cut-over got this error "2021-09-02 08:30:01 ERROR Error 1048: Column 'guid' cannot be null;" means after 4 minutes. May you help me to identify why this happened. I tried 3 times and each time getting same issue.

Table structure is-

CREATE TABLE `batch` (
  `uid` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary Key Value',
  `guid` char(15) NOT NULL COMMENT 'Application assigned batch guid',
  `client_uid` int(10) unsigned NOT NULL COMMENT 'Client Id',
  `name` varchar(25) NOT NULL COMMENT 'Customer provided batch name',
  `type` tinyint(4) NOT NULL COMMENT 'Batch type 0 - non-financial 1 - credit 2 - debit',
  `status` tinyint(4) NOT NULL COMMENT 'Batch status 0 - open 1 - closed 2 - hold 3 - pre-settle 4 - settled',
  `closedby` char(6) DEFAULT NULL COMMENT 'Identifies who closed batch system or client',
  `opentime` datetime NOT NULL COMMENT 'The Date the batch was opened',
  `closetime` datetime DEFAULT NULL COMMENT 'Batch close time',
  `settletime` datetime DEFAULT NULL COMMENT 'Batch settle time',
  `dbtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Database insert time',
  `who` varchar(30) NOT NULL COMMENT 'Insert user and machine name',
  PRIMARY KEY (`uid`),
  UNIQUE KEY `ix_guid` (`guid`),
  KEY `ix_client_uid` (`client_uid`),
  KEY `ix_name` (`name`),
  KEY `ix_status` (`status`),
  KEY `ix_settletime` (`settletime`),
  KEY `ix_opentime` (`opentime`),
  KEY `ix_type` (`type`)
) ENGINE=InnoDB AUTO_INCREMENT=40520073 DEFAULT CHARSET=utf8;

ghost.err file output is-

2021-09-02 08:20:56 INFO starting gh-ost 1.0.48
2021-09-02 08:20:56 INFO Migrating `warehouse`.`batch`
2021-09-02 08:20:56 INFO connection validated on 10.1.1.1:3306
2021-09-02 08:20:56 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `warehouse`.*
2021-09-02 08:20:56 INFO binary logs validated on 10.1.1.1:3306
2021-09-02 08:20:56 INFO Restarting replication on 10.1.1.1:3306 to make sure binlog settings apply to replication thread
2021-09-02 08:20:56 INFO Inspector initiated on xxxxxx.com:3306, version 5.7.34-log
2021-09-02 08:20:56 INFO Table found. Engine=InnoDB
2021-09-02 08:20:56 INFO Estimated number of rows via EXPLAIN: 5127232
2021-09-02 08:20:56 INFO Recursively searching for replication master
2021-09-02 08:20:56 INFO Master found to be xxxxx.com:3306
2021-09-02 08:20:56 INFO log_slave_updates validated on 10.1.1.1:3306
2021-09-02 08:20:56 INFO connection validated on 10.1.1.1:3306
2021-09-02 08:20:56 INFO Connecting binlog streamer at mysql-bin.004304:923988
2021-09-02 08:20:56 INFO connection validated on 10.1.1.1:3306
2021-09-02 08:20:56 INFO connection validated on 10.1.1.1:3306
2021-09-02 08:20:56 INFO will use time_zone='SYSTEM' on applier
2021-09-02 08:20:56 INFO Examining table structure on applier
2021-09-02 08:20:56 INFO Applier initiated on xxxxxxx.com:3306, version 5.7.34--log
2021-09-02 08:20:56 INFO Dropping table `warehouse`.`_batch_ghc`
2021-09-02 08:20:56 INFO Table dropped
2021-09-02 08:20:56 INFO Creating changelog table `warehouse`.`_batch_ghc`
2021-09-02 08:20:56 INFO rotate to next log from mysql-bin.004304:0 to mysql-bin.004304
2021-09-02 08:20:56 INFO Changelog table created
2021-09-02 08:20:56 INFO Creating ghost table `warehouse`.`_batch_gho`
2021-09-02 08:20:56 INFO Ghost table created
2021-09-02 08:20:56 INFO Altering ghost table `warehouse`.`_batch_gho`
2021-09-02 08:20:56 INFO Ghost table altered
2021-09-02 08:20:56 INFO Intercepted changelog state GhostTableMigrated
2021-09-02 08:20:56 INFO Waiting for ghost table to be migrated. Current lag is 0s
2021-09-02 08:20:56 INFO Handled changelog state GhostTableMigrated
2021-09-02 08:20:56 INFO Chosen shared unique key is PRIMARY
2021-09-02 08:20:56 INFO Shared columns are uid,guid,client_uid,name,type,status,closedby,opentime,closetime,settletime,dbtime,who
2021-09-02 08:20:56 INFO Listening on unix socket file: /tmp/gh-ost.warehouse.batch.sock
2021-09-02 08:20:56 INFO Migration min values: [24747176]
2021-09-02 08:20:56 INFO Migration max values: [40522872]
2021-09-02 08:20:56 INFO Waiting for first throttle metrics to be collected
2021-09-02 08:20:56 INFO First throttle metrics collected
2021-09-02 08:23:45 INFO rotate to next log from mysql-bin.004305:268453047 to mysql-bin.004305
2021-09-02 08:23:45 INFO rotate to next log from mysql-bin.004305:0 to mysql-bin.004305
2021-09-02 08:26:26 INFO Row copy complete
2021-09-02 08:30:01 ERROR Error 1048: Column 'guid' cannot be null; query=
                        replace /* gh-ost `warehouse`.`_batch_gho` */ into
                                `warehouse`.`_batch_gho`
                                        (`uid`, `guid`, `client_uid`, `name`, `type`, `status`, `closedby`, `opentime`, `closetime`, `settletime`, `dbtime`, `who`)
                                values
                                        (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                ; args=[<nil> <nil> <nil> <nil> <nil> 2 System <nil> 2021-09-02 08:30:00 <nil> 2021-09-02 08:30:00 +0000 UTC <nil>]

ghost.out file last output is-

Copy: 5123000/5127232 99.9%; Applied: 4; Backlog: 0/1000; Time: 5m23s(total), 5m23s(copy); streamer: mysql-bin.004305:215096234; State: migrating; E
TA: 0s
Copy: 5238104/5238104 100.0%; Applied: 4; Backlog: 0/1000; Time: 5m30s(total), 5m29s(copy); streamer: mysql-bin.004305:225903899; State: migrating;
ETA: due
Copy: 5238104/5238104 100.0%; Applied: 4; Backlog: 0/1000; Time: 5m30s(total), 5m29s(copy); streamer: mysql-bin.004305:225913107; State: postponing
cut-over; ETA: due

zafermalik avatar Sep 02 '21 11:09 zafermalik