drush
drush copied to clipboard
Replace usage of in_array() in MigrateExecutable::handleMissingSourceRows
Describe the bug
Usage of in_array()
in MigrateExecutable::handleMissingSourceRows()
is proving to be very inefficient for migrations with a very large amount of rows.
To Reproduce
Run any migration ID with a very large amount of rows (eg 10,000+).
While the actual migration has a progress bar and lets you know when its finished, the logic in handleMissingSourceRows()
will have the process seem like its frozen for an indeterminate amount of time.
Actual behavior
Running a migration ID with many rows (in my case over 300,000 for upgrade_d7_file_private
) would take roughly 20-30 minutes for the actual migration, but would hang on MigrateExecutable::handleMissingSourceRows()
for multiple hours before having to manually stop the process.
Using in_array()
can be very inefficient as it needs to compare all array values until it finds a match not to mention the current logic is trying to find an an array within an array of arrays.
Workaround
Instead of using in_array()
the $allSourceIdValues
property should be keyed with a unique ID in order to utilize isset()
Having a dedicated method to build the key off the source ID values can allow it to be used when writing to the $allSourceIdValues
property in MigrateExecutable::onPrepareRow()
and reading it within handleMissingSourceRows()
.
Making this change to the example above with 300k rows, brought this post-migration logic to finish within a few minutes instead of multiple hours.