data-import
data-import copied to clipboard
optimise performance
Our mapping file is getting big and we got some tables with a lot of mappings. The performance degraded massively when we added more and more mappings per table. I hope there are some optimizations we can pull of the get the import up to speed again.
course of action
1.) profile a complete import run 2.) share ideas about performance improvements 3.) implement best rated ideas 4.) profile again 5.) go to 1.)
one problem has been identified and fixed: #47 do not use OpenStruct.
Next we should focus on GC, of course migrating is very GC intensive but I hope we can get it below the current 25%.
I tried different configurations for the GC. The best config I found was the one 37signals is using for their rails servers.
export RUBY_HEAP_MIN_SLOTS=600000
export RUBY_HEAP_SLOTS_INCREMENT=10000 # standard
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1.8 # standard
export RUBY_GC_MALLOC_LIMIT=59000000
export RUBY_HEAP_FREE_MIN=100000
With this I brought the GC usage down to 20.6%. But the overall time used for the import changed subtly.
I also tried modifying the parameters according my own understanding of how the GC works. But I could never make the usage go below 20.6%.
I don't think this is worth taking into the library. At least not with my understanding of GC tuning.
@stmichael thanks for trying.
It may be a good idea to implement a bulk insert functionality. Bulk inserts are much faster than regular inserts and are therefore predestined for data migration. From the docs of Postgres, MySQL and MSSQL I saw that all of those support bulk inserts. Although the syntax differs they more or less do the same.
Same additional optimizations with bulk inserts:
- Disable foreign key checks
- Delete indexes before the bulk insert starts and recreate them afterwards
Apparently SQLite doesn't support bulk inserts.