Our mapping file is getting big and we got some tables with a lot of mappings. The performance degraded massively when we added more and more mappings per table. I hope there are some optimizations we can pull of the get the import up to speed again.

course of action

1.) profile a complete import run 2.) share ideas about performance improvements 3.) implement best rated ideas 4.) profile again 5.) go to 1.)

Sep 19 '12 07:09 senny

one problem has been identified and fixed: #47 do not use OpenStruct.

Sep 24 '12 15:09 senny

Next we should focus on GC, of course migrating is very GC intensive but I hope we can get it below the current 25%.

Sep 24 '12 15:09 senny

I tried different configurations for the GC. The best config I found was the one 37signals is using for their rails servers.

export RUBY_HEAP_MIN_SLOTS=600000
export RUBY_HEAP_SLOTS_INCREMENT=10000  # standard
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1.8  # standard
export RUBY_GC_MALLOC_LIMIT=59000000
export RUBY_HEAP_FREE_MIN=100000

With this I brought the GC usage down to 20.6%. But the overall time used for the import changed subtly.

I also tried modifying the parameters according my own understanding of how the GC works. But I could never make the usage go below 20.6%.

I don't think this is worth taking into the library. At least not with my understanding of GC tuning.

Sep 26 '12 11:09 stmichael

@stmichael thanks for trying.

Sep 26 '12 11:09 senny

It may be a good idea to implement a bulk insert functionality. Bulk inserts are much faster than regular inserts and are therefore predestined for data migration. From the docs of Postgres, MySQL and MSSQL I saw that all of those support bulk inserts. Although the syntax differs they more or less do the same.

Same additional optimizations with bulk inserts:

Disable foreign key checks
Delete indexes before the bulk insert starts and recreate them afterwards

Apparently SQLite doesn't support bulk inserts.

Sep 26 '12 12:09 stmichael

data-import
data-import copied to clipboard

optimise performance

course of action

data-import data-import copied to clipboard

optimise performance

course of action

data-import
data-import copied to clipboard