mosql
mosql copied to clipboard
String not valid UTF-8 (BSON::InvalidStringEncoding)
I have the following exception when importing a collection, the data should be valid since it is already present in the database.
Any ideas?
/var/lib/gems/1.9.1/gems/bson-1.10.2/lib/bson/bson_c.rb:20:in `serialize': String not valid UTF-8 (BSON::InvalidStringEncoding)
from /var/lib/gems/1.9.1/gems/bson-1.10.2/lib/bson/bson_c.rb:20:in `serialize'
from /var/lib/gems/1.9.1/gems/bson-1.10.2/lib/bson.rb:19:in `serialize'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/schema.rb:212:in `transform'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:148:in `block (3 levels) in import_collection'
from /var/lib/gems/1.9.1/gems/mongo-1.10.2/lib/mongo/cursor.rb:335:in `each'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:147:in `block (2 levels) in import_collection'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:71:in `block in with_retries'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:69:in `times'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:69:in `with_retries'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:146:in `block in import_collection'
from /var/lib/gems/1.9.1/gems/mongo-1.10.2/lib/mongo/collection.rb:291:in `find'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:145:in `import_collection'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:123:in `block (2 levels) in initial_import'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:121:in `each'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:121:in `block in initial_import'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:109:in `each'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:109:in `initial_import'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/streamer.rb:28:in `import'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/cli.rb:167:in `run'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/lib/mosql/cli.rb:16:in `run'
from /var/lib/gems/1.9.1/gems/mosql-0.4.2/bin/mosql:5:in `<top (required)>'
from /usr/local/bin/mosql:23:in `load'
from /usr/local/bin/mosql:23:in `<main>'
Please note this is failing even with the --unsafe flag.
any update on this one?
I had the same issue. I just monkey patched it to remove the invalid k,v from the obj. I replaced the mosql
binary with the following, which I call monkey-patched-mosql
. Then, I run the ETL process from the following code, which modifies the MoSQL::Schema.transform
method. It could be cleaned up by using a super
.
The ETL errors from my data were caused by binary values and larger than expected BSON documents.
#!/usr/bin/env ruby
require 'mosql/cli'
module MoSQL
class Schema
def transform(ns, obj, schema=nil, depth = 0)
schema ||= find_ns!(ns)
original = obj
# Do a deep clone, because we're potentially going to be
# mutating embedded objects.
obj = BSON.deserialize(BSON.serialize(obj))
row = []
schema[:columns].each do |col|
source = col[:source]
type = col[:type]
if source.start_with?("$")
v = fetch_special_source(obj, source, original)
else
v = fetch_and_delete_dotted(obj, source)
case v
when Hash
v = JSON.dump(Hash[v.map { |k,v| [k, transform_primitive(v)] }])
when Array
v = v.map { |it| transform_primitive(it) }
if col[:array_type]
v = Sequel.pg_array(v, col[:array_type])
else
v = JSON.dump(v)
end
else
v = transform_primitive(v, type)
end
end
row << v
end
if schema[:meta][:extra_props]
extra = sanitize(obj)
row << JSON.dump(extra)
end
log.debug { "Transformed: #{row.inspect}" }
row
rescue BSON::InvalidStringEncoding, BSON::InvalidDocument
obj = obj.select do |k,v|
begin
BSON.deserialize(BSON.serialize({"#{k}" => v}))
true
rescue BSON::InvalidStringEncoding, BSON::InvalidDocument
puts "Pruning #{k} from the hash."
false
end
end
raise "tried and failed to prune with #{[ns, obj, schema]}" if depth > 2
transform(ns, obj, schema, depth + 1)
end
end
end
MoSQL::CLI.run(ARGV)
+1 - anyone know what would cause this? I checked the timestamp that it appears to be failing on and I don't see any issues
looks like there was a PR open to resolve this here: #83 which broke tests.