mysql2sqlite icon indicating copy to clipboard operation
mysql2sqlite copied to clipboard

Hiccup while processing long comma-delimited string

Open crenelle opened this issue 4 years ago • 9 comments

as per "README.md" from author of "mysql2sqlite" we used these:

  1. Dump MySQL DB mysqldump --skip-extended-insert --compact trading_db crb_portfolios_table

  2. Convert the dump to SQLite3 DB ./mysql2sqlite crb_portfolios_table | sqlite3 trading_db


This short dump file will reproduce the error:

/*!40101 SET @saved_cs_client = @@character_set_client /; /!40101 SET character_set_client = utf8 /; CREATE TABLE crb_portfolios_table ( portfolio_name varchar(100) NOT NULL DEFAULT '', portfolio_selections_string blob, PRIMARY KEY (portfolio_name) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; /!40101 SET character_set_client = @saved_cs_client */; INSERT INTO crb_portfolios_table VALUES ('all active futures liquid 15-80 cash 5',_binary 'AA,AD,AZ,BO,BP,C-,CB,CC,CD,CL,CT,DF,EC,ED,ES,FF,FX,GC,HG,HH,HO,HS,JY,KC,KS,KW,LC,LH,LJ,MX,MZ,NG,NH,NN,NO,NP,NX,OX,PI,RB,RJ,S-,SB,SI,SM,SP,SS,SZ,TP,TT,VX,W-,X-,ZM'); INSERT INTO crb_portfolios_table VALUES ('all active futures liquid 5-30 all inclusive',_binary '5L,A2,AA,AD,AZ,B-,BF,BG,BM,BO,BP,BS,BT,C-,CA,CB,CC,CD,CG,CK,CL,CT,DF,DI,DR,DX,EC,ED,ES,F-,FA,FB,FC,FF,FV,FX,FY,GC,GH,GM,GX,HG,HH,HN,HO,HS,IA,IB,IE,II,IL,IT,IX,JY,K6,K7,K8,KB,KC,KD,KL,KM,KS,KW,L-,LC,LF,LH,LJ,LL,LO,LQ,LW,ME,MH,MW,MX,MZ,ND,NE,NF,NG,NH,NI,NK,NN,NO,NP,NV,NX,OT,OX,PA,PI,PL,PO,PP,PU,QB,QE,R-,RB,RE,RJ,S-,SB,SF,SI,SM,SP,SS,SV,SZ,T2,TE,TO,TP,TQ,TS,TT,TU,TY,U0,UG,US,V4,V5,V9,VA,VC,VD,VH,VI,VO,VP,VR,VX,W-,WB,WC,WG,WI,WO,WQ,WR,X-,XO,XT,XU,XV,Y1,Y2,YC,YD,YE,YG,YK,YQ,YS,YT,YV,YW,YX,Z0,Z2,ZM,ZN,ZW'); INSERT INTO crb_portfolios_table VALUES ('all active options liquid 15-80 cash 5',_binary 'AD,BO,BP,C-,CB,CC,CD,CL,CT,DF,EC,ED,ES,FF,FX,GC,HG,HO,HS,JY,KC,KW,LC,LH,MX,MZ,NG,NN,NX,RB,RJ,S-,SB,SI,SM,SP,SS,SZ,W-,X-,ZM'); INSERT INTO crb_portfolios_table VALUES ('all futures active and liquid',_binary '2B,2C,2G,2H,2I,2J,2K,2L,2N,2S,2T,2U,2V,2W,5A,5D,5F,5G,5L,5P,5S,5V,5Z,AA,AD,AZ,B-,BF,BM,BO,BP,BS,BT,C-,CA,CB,CC,CD,CG,CK,CL,DF,EC,ED,EI,ES,F-,FA,FF,FV,FX,GC,GH,GM,HG,HH,HN,HO,HS,IL,JY,K6,K7,KB,KC,KD,KS,KW,L-,LC,LF,LH,LJ,MH,MX,MZ,ND,NG,NH,NN,NO,NP,NX,OT,OX,PI,PO,R-,RB,RE,RJ,S-,SB,SI,SM,SP,SS,SZ,TS,TT,TU,TY,U0,UG,US,VA,VC,VD,VH,VI,VP,VR,VX,W-,WG,WI,WJ,WO,WQ,WS,X-,XO,XT,XU,XV,YC,YG,YK,YQ,YS,Z2,ZM'); INSERT INTO crb_portfolios_table VALUES ('all options active and liquid',_binary 'AD,B-,BF,BO,BP,C-,CB,CC,CD,CK,CL,CT,DF,EC,ED,ES,F-,FF,FV,FX,GC,GH,GM,HG,HO,HS,JY,KC,KW,L-,LC,LF,LH,MH,MX,MZ,NG,NN,NP,NX,R-,RB,RJ,S-,SB,SI,SM,SP,SS,SZ,TU,TY,UG,US,VD,VI,W-,WC,WI,X-,ZM'); INSERT INTO crb_portfolios_table VALUES ('Currencies active and liquid futures',_binary '5A,5D,5F,5G,5P,5V,AA,AD,BP,CD,EC,JY,KD,NP,U0,VD');

crenelle avatar Jan 07 '20 23:01 crenelle

Thanks for the report. Could you please paste the SQL code to a formatted paragraph (using 3 consecutive bacticks on a pre-paragraph line and another 3 on a post-paragraph line)? Currently when I formatted the SQL code, mysql2sqlite worked for me without any issues.

dumblob avatar Jan 08 '20 09:01 dumblob

Spoke with author of the above bug report.

more info as of "Fri Jan 10 21:00:37 CST 2020"

The tables that cause errors are crb_portfolios2_table crb_portfolios_table and are the only 2 tables that contain a column with the data type of "BLOB"'

It seems that mysqldump exports data of type "blob" by putting the string "_binary " before the data to be inserted It imports OK if I remove the "_binary " string from the "INSERT" directive

We are able to work around the problem.

crenelle avatar Jan 11 '20 03:01 crenelle

The following are the only substitutions made to INSERT... lines:

  # first replace \\ by \_ that mysqldump never generates to deal with
  # sequnces like \\n that should be translated into \n, not \<LF>.
  # After we convert all escapes we replace \_ by backslashes.
  gsub( /\\\\/, "\\_" )

  # single quotes are escaped by another single quote
  gsub( /\\'/, "''" )
  gsub( /\\n/, "\n" )
  gsub( /\\r/, "\r" )
  gsub( /\\"/, "\"" )
  gsub( /\\\032/, "\032" )  # substitute char

  gsub( /\\_/, "\\" )

So I still can't find anything regarding _binary . If you'll find some time, please post your SQL code here (but formatted please to not lose any information). If it's a bug in mysql2sqlite, I'd like to fix it.

dumblob avatar Jan 11 '20 10:01 dumblob


/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `crb_portfolios_table` (
  `portfolio_name` varchar(100) NOT NULL DEFAULT '',
  `portfolio_selections_string` blob,
  PRIMARY KEY (`portfolio_name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;
INSERT INTO `crb_portfolios_table` VALUES ('all active futures liquid 15-80 cash 5',_binary 'AA,AD,AZ,BO,BP,C-,CB,CC,CD,CL,CT,DF,EC,ED,ES,FF,FX,GC,HG,HH,HO,HS,JY,KC,KS,KW,LC,LH,LJ,MX,MZ,NG,NH,NN,NO,NP,NX,OX,PI,RB,RJ,S-,SB,SI,SM,SP,SS,SZ,TP,TT,VX,W-,X-,ZM');
INSERT INTO `crb_portfolios_table` VALUES ('all active futures liquid 5-30 all inclusive',_binary '5L,A2,AA,AD,AZ,B-,BF,BG,BM,BO,BP,BS,BT,C-,CA,CB,CC,CD,CG,CK,CL,CT,DF,DI,DR,DX,EC,ED,ES,F-,FA,FB,FC,FF,FV,FX,FY,GC,GH,GM,GX,HG,HH,HN,HO,HS,IA,IB,IE,II,IL,IT,IX,JY,K6,K7,K8,KB,KC,KD,KL,KM,KS,KW,L-,LC,LF,LH,LJ,LL,LO,LQ,LW,ME,MH,MW,MX,MZ,ND,NE,NF,NG,NH,NI,NK,NN,NO,NP,NV,NX,OT,OX,PA,PI,PL,PO,PP,PU,QB,QE,R-,RB,RE,RJ,S-,SB,SF,SI,SM,SP,SS,SV,SZ,T2,TE,TO,TP,TQ,TS,TT,TU,TY,U0,UG,US,V4,V5,V9,VA,VC,VD,VH,VI,VO,VP,VR,VX,W-,WB,WC,WG,WI,WO,WQ,WR,X-,XO,XT,XU,XV,Y1,Y2,YC,YD,YE,YG,YK,YQ,YS,YT,YV,YW,YX,Z0,Z2,ZM,ZN,ZW');
INSERT INTO `crb_portfolios_table` VALUES ('all active options liquid 15-80 cash 5',_binary 'AD,BO,BP,C-,CB,CC,CD,CL,CT,DF,EC,ED,ES,FF,FX,GC,HG,HO,HS,JY,KC,KW,LC,LH,MX,MZ,NG,NN,NX,RB,RJ,S-,SB,SI,SM,SP,SS,SZ,W-,X-,ZM');
INSERT INTO `crb_portfolios_table` VALUES ('all futures active and liquid',_binary '2B,2C,2G,2H,2I,2J,2K,2L,2N,2S,2T,2U,2V,2W,5A,5D,5F,5G,5L,5P,5S,5V,5Z,AA,AD,AZ,B-,BF,BM,BO,BP,BS,BT,C-,CA,CB,CC,CD,CG,CK,CL,DF,EC,ED,EI,ES,F-,FA,FF,FV,FX,GC,GH,GM,HG,HH,HN,HO,HS,IL,JY,K6,K7,KB,KC,KD,KS,KW,L-,LC,LF,LH,LJ,MH,MX,MZ,ND,NG,NH,NN,NO,NP,NX,OT,OX,PI,PO,R-,RB,RE,RJ,S-,SB,SI,SM,SP,SS,SZ,TS,TT,TU,TY,U0,UG,US,VA,VC,VD,VH,VI,VP,VR,VX,W-,WG,WI,WJ,WO,WQ,WS,X-,XO,XT,XU,XV,YC,YG,YK,YQ,YS,Z2,ZM');
INSERT INTO `crb_portfolios_table` VALUES ('all options active and liquid',_binary 'AD,B-,BF,BO,BP,C-,CB,CC,CD,CK,CL,CT,DF,EC,ED,ES,F-,FF,FV,FX,GC,GH,GM,HG,HO,HS,JY,KC,KW,L-,LC,LF,LH,MH,MX,MZ,NG,NN,NP,NX,R-,RB,RJ,S-,SB,SI,SM,SP,SS,SZ,TU,TY,UG,US,VD,VI,W-,WC,WI,X-,ZM');
INSERT INTO `crb_portfolios_table` VALUES ('Currencies active and liquid futures',_binary '5A,5D,5F,5G,5P,5V,AA,AD,BP,CD,EC,JY,KD,NP,U0,VD');

crenelle avatar Jan 12 '20 18:01 crenelle

This is the script we use to work around the problem.


#!/opt/local/bin/perl
#
#	crb.export.for.sqlite - export select tables for import to sqlite
#
#	20200105-01	original incarnation
#	20200106-01	added code (line 28) so that Mike can put the output wherever he likes - currently set to the users home directory
#	20200112-01	added code to remove "_binary " string from output files that contain "blob" data type
#
#	use diagnostics;
	use strict;
	use warnings;
	use 5.010;
	
	if( $#ARGV > -1 ) {								# if value was passed
		print "[$ARGV[0]] passed as argument\n";	# print debug message
		die "no argumants allowed\n";				# and exit
	}

	print "crb.export.for.sqlite ................................."; 
	my($DATE) = &time_stamp() ;
	print " - START  $DATE\n";
	my($TIMESTAMP) = &time_stamp() ;								# create time stamp
	my($DAY_OF_WEEK) = `date +%a|tr -d "\n"`;						# get day of week
	my($LOG_STAMP) = "$TIMESTAMP $DAY_OF_WEEK";						# create log timestamp

	my($CUTOFF_DATE) = &cutoff_date() ;								# set CUTOFF_DATE back 10 days to rule out symbols no longer trading

#	my($BASE_DIRECTORY) = "/trading/data/crb";						# where the crb files reside - ben
	my($BASE_DIRECTORY) = $ENV{'HOME'}; 							# where the crb files reside - mike
	if ( ! -d "$BASE_DIRECTORY" ) {									# if directory does not exist
			die"[$BASE_DIRECTORY] does not exist\n\n";				# then complain and exit
	}

	my($EXPORT_DIRECTORY) = "$BASE_DIRECTORY/sqlite.export";		# where the sqlite.export files reside
	if ( ! -d "$EXPORT_DIRECTORY" ) {								# if directory does not exist
			mkdir("$EXPORT_DIRECTORY");								# then make one
	}

	my($SQLITE3_DIRECTORY) = "$BASE_DIRECTORY/sqlite3.database";	# where the sqlite database files reside
	if ( ! -d "$SQLITE3_DIRECTORY" ) {								# if directory does not exist
			mkdir("$SQLITE3_DIRECTORY");							# then make one
	}

	my(@LIST_OF_TABLES) = `echo "SHOW TABLES" | /usr/local/mysql/bin/mysql --skip-column-names trading_db`;	# get list of table names
	my(@LIST_OF_ACTIVE_OPTIONS) = `echo "select symbol
									 FROM crb_active_options_table
									 WHERE f_max_date>\\\"$CUTOFF_DATE\\\"
									 AND (ccm_exchange=\\\"CBOT\\\"
									 OR ccm_exchange=\\\"CME\\\"
									 OR ccm_exchange=\\\"ICE-US\\\"
									 OR ccm_exchange=\\\"NYMEX\\\")									 
									;
				" | /usr/local/mysql/bin/mysql --skip-column-names trading_db`;	# get list of table names for US exchanges


	my($TABLE_NAME);
	my($ACTIVE_OPTION);
	my($test_symbol);
	my(@EXPORT_TABLE_NAMES);
	my($LINE_COUNTER)=0;

#	print "-----------\n";
	foreach $ACTIVE_OPTION (@LIST_OF_ACTIVE_OPTIONS) {	# convert symbols to table name endings
		chomp($ACTIVE_OPTION);
		$ACTIVE_OPTION =~ tr/-/_/;
#		$LINE_COUNTER++;
#		print "$LINE_COUNTER [$ACTIVE_OPTION]\n";
	}
#	print "-----------\n";

#exit;

	print "-----------\n";
	foreach $TABLE_NAME (@LIST_OF_TABLES) {							# only pass names of tables to export
		chomp($TABLE_NAME);
		if (index($TABLE_NAME, "crb_avg_history_table") != -1) {	# skip crb_avg_history_table
			next;
		}
		
		if (index($TABLE_NAME, "crb_futures_table") == 0) {			# if crb_futures_table
			$test_symbol=substr($TABLE_NAME,-2);					# get symbol

			if (grep { $test_symbol eq $_ } @LIST_OF_ACTIVE_OPTIONS) {
				;
#			   print "test_symbol $test_symbol is in LIST_OF_ACTIVE_OPTIONS\n";
			} else {
			   next;
			}
		}

		if (index($TABLE_NAME, "crb_options_table") == 0) {			# if crb_options_table
			$test_symbol=substr($TABLE_NAME,-2);					# get symbol

			if (grep { $test_symbol eq $_ } @LIST_OF_ACTIVE_OPTIONS) {
				;
#			   print "test_symbol $test_symbol is in LIST_OF_ACTIVE_OPTIONS\n";
			} else {
			   next;
			}
		}

		$LINE_COUNTER++;

		my(@BLOB_INDICATOR) = `echo "describe \\\`$TABLE_NAME\\\`
			" | /usr/local/mysql/bin/mysql -N trading_db`;
		chomp(@BLOB_INDICATOR);

##		if ($LINE_COUNTER < 169) {	# code for testing to limit run to a specific subset of tables
##			next;
##		}

#		print "$LINE_COUNTER [$TABLE_NAME]\n";
#		print "$LINE_COUNTER [mysqldump trading_db $TABLE_NAME > $EXPORT_DIRECTORY/$TIMESTAMP.$TABLE_NAME]\n"
		$DATE = &time_stamp() ;
		printf '%4s',$LINE_COUNTER;

		print " $DATE.$TABLE_NAME";

		if (grep { (index($_, "blob") != -1)  } @BLOB_INDICATOR) {
			print " - [contains blob]";
			# remove string "_binary " from output file;
			`mysqldump  --skip-extended-insert --compact trading_db $TABLE_NAME | sed 's/_binary //' > $EXPORT_DIRECTORY/$TIMESTAMP.$TABLE_NAME`;	# mysql2sqlite version
		} else {
			`mysqldump  --skip-extended-insert --compact trading_db $TABLE_NAME > $EXPORT_DIRECTORY/$TIMESTAMP.$TABLE_NAME`;	# mysql2sqlite version
		}
		print "\n";


		`mysql2sqlite $EXPORT_DIRECTORY/$TIMESTAMP.$TABLE_NAME | sqlite3 $SQLITE3_DIRECTORY/trading_db`;					# import data into sqlite3
		`rm $EXPORT_DIRECTORY/$TIMESTAMP.$TABLE_NAME`;			# delete dump file after import into sqlite3 database

##		if ($LINE_COUNTER >= 5) {	# code for testing to limit run to a set number of tables
##			exit;
##		}

	}
	print "-----------\n";

	print "crb.export.for.sqlite ................................."; 
	$DATE = &time_stamp() ;
	print " - DONE - $DATE\n";

exit;

# ____________________________________________________________________________________
# subroutines

# _____________
sub time_stamp {
  my ($stamp);
  my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);

        $year += 1900;
        $mon++;
        $stamp = sprintf("%4d%2.2d%2.2d-%2.2d%2.2d%2.2d",$year,$mon,$mday,$hour,$min,$sec);
        return($stamp);
}

# _____________
sub cutoff_date {

	my ($now) = time();
	my ($cutoff_date) = $now - (10 * 86400);
	my ($day, $mon, $year) = (localtime($cutoff_date))[3, 4, 5];
	$cutoff_date=sprintf("%04d-%02d-%02d", $year+1900, $mon+1, $day);
	return($cutoff_date)
}

crenelle avatar Jan 12 '20 18:01 crenelle

@crenelle thanks for the additional info. I wasn't fully aware of this complication with _binary. I think a better solution than removing _binary could be to use mysqldump --hex-blob ... - could you test it on your DB?

dumblob avatar Jan 12 '20 19:01 dumblob

On 1/12/20 11:10 AM, dumblob wrote:

@crenelle https://github.com/crenelle thanks for the additional info. I wasn't fully aware of this complication with |_binary|. I think a better solution than removing |_binary| could be to use |mysqldump --hex-blob ...| - could you test it on your DB?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dumblob/mysql2sqlite/issues/66?email_source=notifications&email_token=AACK6RO42NR7TDYTZEFQTYDQ5NTK7A5CNFSM4KEAP3X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIXBWQY#issuecomment-573446979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACK6RPSB74QP5SGLAC2F5DQ5NTK7ANCNFSM4KEAP3XQ.

Sure can. Ben has been doing the wild-eyed scripting, I'll tell him about the different approach tomorrow!

Tnx,

-Mike

--

Michael Brian Bentley Application Development Consulting Crenelle Technologies LLC

[email protected] +1 (206) 601-7758 17823 McLean Rd SW Vashon, WA 98070-5421

crenelle avatar Jan 13 '20 07:01 crenelle

@crenelle thanks for the additional info. I wasn't fully aware of this complication with _binary. I think a better solution than removing _binary could be to use mysqldump --hex-blob ... - could you test it on your DB?

after using --hex-blob in mysqldump command, mysql2sqlite script will warn WARN Hex number trimmed (length longer than 16 chars). and then truncate the blob data.

can i fix issue by adding gsub( /^_binary/, "" ) after ?

# first replace \\ by \_ that mysqldump never generates to deal with
  # sequnces like \\n that should be translated into \n, not \<LF>.
  # After we convert all escapes we replace \_ by backslashes.
  gsub( /\\\\/, "\\_" )

  # single quotes are escaped by another single quote
  gsub( /\\'/, "''" )
  gsub( /\\n/, "\n" )
  gsub( /\\r/, "\r" )
  gsub( /\\"/, "\"" )
  gsub( /\\\032/, "\032" )  # substitute char

  gsub( /\\_/, "\\" )

PapaYofen avatar Jan 11 '22 03:01 PapaYofen

I'm really busy nowadays. Could you prepare a PR like https://github.com/dumblob/mysql2sqlite/pull/68/files (i.e. the patch + test cases) so that I can test it quickly? Thanks!

dumblob avatar Jan 12 '22 09:01 dumblob