bulk_insert icon indicating copy to clipboard operation
bulk_insert copied to clipboard

Use #exec_insert over #exec_query because there's no field selections

Open daemonsy opened this issue 6 years ago • 8 comments

What's up?

Hi there,

Thanks for the gem! It's awesome. While I was trying it, I got an error on calling worker.save!.

NoMethodError: undefined method `fields' for nil:NilClass
	from /Users/saw/.gem/ruby/2.3.6/gems/activerecord-3.2.22.9/lib/active_record/connection_adapters/mysql2_adapter.rb:219:in `exec_query'
	from (irb):2
	from /Users/saw/.gem/ruby/2.3.6/gems/railties-3.2.22.9/lib/rails/commands/console.rb:47:in `start'
	from /Users/saw/.gem/ruby/2.3.6/gems/railties-3.2.22.9/lib/rails/commands/console.rb:8:in `start'
	from /Users/saw/.gem/ruby/2.3.6/gems/railties-3.2.22.9/lib/rails/commands.rb:41:in `<top (required)>'
	from script/rails:6:in `require'
	from script/rails:6:in `<main>'

Investigation

It seems that the basic skeleton code of Connection#exec_query expects a result set that has field selections. However, since we're doing a bulk insert, it doesn't seem like this will work.

def exec_query(sql, name = 'SQL', binds = [])
  result = execute(sql, name)
  ActiveRecord::Result.new(result.fields, result.to_a)
end

However, there is an Connection#exec_insert method that does not expect return results.

I suspect exec_query is used because of return_primary_keys where we call a SELECT after the insert.

What this does

  • Use exec_query when there primary keys are required
  • Use exec_insert otherwise

daemonsy avatar Jul 18 '18 16:07 daemonsy

Hi @jamis a small bump, hopefully you will have a chance to take a look in the coming days. Have a good weekend.

daemonsy avatar Jul 20 '18 17:07 daemonsy

It seems that at least it didn't error on the mysql2 adapter.

I'll test it on Rails 4/5 and also Postgres and let you know what I find

daemonsy avatar Jul 31 '18 20:07 daemonsy

It should be fine for postgresql I guess. For MySQL is a little bit more complicated since you cannot emulate the behavior of postgresql's returning out of the box:

Returning equivalent in MySQL https://stackoverflow.com/q/9027008/5687152

For a single insertion it would make sense to retrieve LAST_INSERT_ID().

However for bulk insert is more complicates:

https://stackoverflow.com/q/7333524/5687152

InnoDB guarantees sequential numbers for AUTO INCREMENT when doing bulk inserts, provided innodb_autoinc_lock_mode is set to 0 (traditional) or 1 (consecutive). Consequently you can get the first ID from LAST_INSERT_ID() and the last by adding ROW_COUNT()-1.

Even if we would like to implement this combination gets quite hard in cases like the one I am facing with a MySQL DB where the autoincrement offset is changing according to the table.

More details on the keywords from an old (and still used a lot) MySQL version doc:

https://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_last-insert-id

I am not excluding that there is a smart and straightforward way to make pg logic available in this cases. (E.g. it could make sense to use this combination if you can easily retrieve the autoincrement offset and you retrieve the row diff and the last id within the same transaction as the insert)

mberlanda avatar Jul 31 '18 21:07 mberlanda

Sorry to be slow to chime in here. Yeah, the "return primary keys" is Postgres-only, and as @mberlanda has pointed out, it's non-trivial to implement in MySQL (or, AFAIK, other databases in general).

I think the right way forward, here, is to only use exec_query when @return_primary_keys is true, and the adapter is Postgres. Otherwise, it should use execute or exec_insert.

(Interestingly, the default implementation of exec_insert calls exec_query...though perhaps not in the MySQL adapter?)

jamis avatar Aug 01 '18 20:08 jamis

I have checked the current implementation of exec_query for MySQL adapter on the ActiveRecord master branch:

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/mysql/database_statements.rb

def exec_query(sql, name = "SQL", binds = [], prepare: false)
  if without_prepared_statement?(binds)
    execute_and_free(sql, name) do |result|
      ActiveRecord::Result.new(result.fields, result.to_a) if result
    end
  else
    exec_stmt_and_free(sql, name, binds, cache_stmt: prepare) do |_, result|
      ActiveRecord::Result.new(result.fields, result.to_a) if result
    end
  end
end

This is preventing from raising NoMethodError: undefined method 'fields' for nil:NilClass. However, since no else is provided the result returned will be nil.

I will suggest to create a custom error and raise when return primary_key is provided with adapters other than pg. I can do a quick pr!

mberlanda avatar Aug 02 '18 05:08 mberlanda

Hey @mberlanda @jamis sorry I haven't dug into the code, I see that Mauro is on top of it already. To take a step back, the issue I uncovered was that undefined method fields' for nil:NilClass` and it is caused bulk insert returning nothing (mysql) and that caused the error instantiating the result set.

At that point, my thinking is just that exec_query is used when you except results and exec_insert doesn't expect results.

Following Mauro's investigation and reading the code for the different adapters, that seems to be case. In PG's case, exec_query just naively instantiates results. In MySQL's case (copied from above), it has a guard clause.

**NB: PG's exec_insert seem to have something that deals with the return_primary_key option. I'll take a look. **

I agree with @jamis 's comment, we should just use the two methods accordingly. I can update the PR to make sure it works with both PG + (return primary keys) and MySQL.

exec_query

PG

def exec_query(sql, name = "SQL", binds = [], prepare: false)
  execute_and_clear(sql, name, binds, prepare: prepare) do |result|
    types = {}
    fields = result.fields
    fields.each_with_index do |fname, i|
      ftype = result.ftype i
      fmod  = result.fmod i
      types[fname] = get_oid_type(ftype, fmod, fname)
    end
    ActiveRecord::Result.new(fields, result.values, types)
  end
end

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb#L80

MySQL

def exec_query(sql, name = "SQL", binds = [], prepare: false)
  if without_prepared_statement?(binds)
    execute_and_free(sql, name) do |result|
      ActiveRecord::Result.new(result.fields, result.to_a) if result
    end
  else
    exec_stmt_and_free(sql, name, binds, cache_stmt: prepare) do |_, result|
      ActiveRecord::Result.new(result.fields, result.to_a) if result
    end
  end
end

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/mysql/database_statements.rb#L31

daemonsy avatar Aug 02 '18 15:08 daemonsy

Sorry I haven't acted on this. Will try to rally sometime to do it within this week.

daemonsy avatar Sep 11 '18 14:09 daemonsy

ok! Don't worry! I hope I will have the time to finish soon #37 to allow testing against the real database without mocking the adapter. In this case we should be able to validate that this has the expected behaviour for mysql2 adapter 😄

mberlanda avatar Sep 11 '18 14:09 mberlanda