hale
hale copied to clipboard
Bad performance for Merge with auto-detect enabled
The auto-detect feature in the Merge configuration seems to lead to a very bad performance during the transformation. It seems this problem was introduced already in version 3.2.0 with this PR: https://github.com/halestudio/hale/pull/285.
This problem occurs if there is a significant amount of instances that are merged together within a Merge, because the comparisons done for the auto-detect is O(n²). An additional factor is if there are many attributes that are compared.
Example from a median sized data set (~100k instances, up to ~1000 instances merged together):
- Transformation w/o auto-detect: ~3 minutes
- Transformation w/ auto-detect: more than 3 hours
Right now the workaround is to explicitly configure properties in the Merge configuration and leave the auto-detect feature turned off.
Add a comment to make clear that auto-merge affects performance negatively.
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.