hasten Fails on binary field data

MySQL allows storing "binary" data types. mysqldump outputs these as byte strings. This results in invalid UTF-8 sequences, which cause the following stack trace when the dump file is piped through hasten:

$ time zcat mysqldump.sql.gz | hasten | mysql test_hasten
/usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `match'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `complete?'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/dump.rb:36:in `parse_command'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/dump.rb:13:in `execute'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/bin/hasten:16:in `<top (required)>'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/hasten:23:in `load'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/hasten:23:in `<main>'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/ruby_executable_hooks:15:in `eval'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/ruby_executable_hooks:15:in `<main>'

I imagine it would be rather difficult to identify these binary strings and treat the separately in order to avoid this. I guess the other alternative would be to load the file as ASCII 8-bit, but then I'm not sure what would happen with valid UTF-8 characters...

Bit of a pickle, not sure if there is a good solution, but wanted to at least get the issue out there. I was trying it out on one of our dumps to see if it would speed things up on restore, which we sorely need, but this is unfortunately a blocker for using hasten.

Feb 11 '15 14:02 HerbCSO

Thanks for the report! If you could make available an example dump that contained such a binary string, I could tackle it. Or if you want to issue a PR I would gladly review it.

Feb 13 '15 00:02 thirtysixthspan

Test file available at https://gist.github.com/HerbCSO/2b4213cdc1e81df121e8 (uuencoded). I'd love to do a PR, but as I mentioned I'm not really sure how to approach it.

To reproduce, once you have the file uudecoded, run cat binary_data.sql | hasten | mysql binary_test. It should produce the following error:

/Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `match'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `complete?'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/dump.rb:36:in `parse_command'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/dump.rb:13:in `execute'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/bin/hasten:16:in `<top (required)>'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/bin/hasten:23:in `load'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/bin/hasten:23:in `<main>'

Feb 14 '15 03:02 HerbCSO

+1

Mar 19 '15 18:03 paneq

+1

Mar 18 '16 20:03 martinille

+1

Jul 28 '17 00:07 aweis89

If anyone is interested I've modified it here: https://github.com/thirtysixthspan/hasten/pull/2 It replaces characters causing issues with '?'

Jul 29 '17 22:07 meme-lord

adding mysqldump option: --hex-blob seems to be a workaround for this issue at the moment.

Sep 10 '19 13:09 kbresin

I've made a bash script here: https://github.com/meme-lord/hasten_py/blob/master/hasten.sh That basically does what hasten does. hasten is a lot slower because it parses all the SQL for no real reason. This also has no errors with encoding or binary data.

Sep 10 '19 15:09 meme-lord

Interesting script, although that isn't quite what hasten does.

Hasten removes the key definitions on new tables, and then adds them back in at the end via alters.

Your script is interesting, and I think the auto-commit 0 angle is interesting, but for very large databases I'm not sure one commit at the very end is a great idea.

I am probably going to write my own script that disables auto-commit, and then has a configurable number of INSERT lines to add COMMITs after.

To provide a happy medium between: "COMMIT after every single INSERT" and "COMMIT only once at the end!"

Which I think should speed up large table innodb INSERTs considerably, without hasten's sort of complex approach.

Sep 10 '19 15:09 kbresin

That's an excellent idea. I can try implement it the next time I need to do a large import. I didn't spot the key definition thing last time I read through the source. Do you think grouping inserts would also give a speedup? Some SQL dumps I've seen have one insert statement per row.

Sep 10 '19 16:09 meme-lord

Do you think grouping inserts would also give a speedup?

It definitely does, but I think that is best handled by mysqldump's --extended-insert flag :

Write INSERT statements using multiple-row syntax that includes several VALUES lists. This results in a smaller dump file and speeds up inserts when the file is reloaded.

Sep 10 '19 18:09 kbresin

hasten hasten copied to clipboard

Fails on binary field data

hasten
hasten copied to clipboard