hale icon indicating copy to clipboard operation
hale copied to clipboard

Bad performance for Merge with auto-detect enabled

Open stempler opened this issue 7 years ago • 2 comments
trafficstars

The auto-detect feature in the Merge configuration seems to lead to a very bad performance during the transformation. It seems this problem was introduced already in version 3.2.0 with this PR: https://github.com/halestudio/hale/pull/285.

This problem occurs if there is a significant amount of instances that are merged together within a Merge, because the comparisons done for the auto-detect is O(n²). An additional factor is if there are many attributes that are compared.

Example from a median sized data set (~100k instances, up to ~1000 instances merged together):

  • Transformation w/o auto-detect: ~3 minutes
  • Transformation w/ auto-detect: more than 3 hours

Right now the workaround is to explicitly configure properties in the Merge configuration and leave the auto-detect feature turned off.

stempler avatar Jun 12 '18 07:06 stempler

Add a comment to make clear that auto-merge affects performance negatively.

thorsten-reitz avatar Jun 15 '18 14:06 thorsten-reitz

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 19 '24 02:05 github-actions[bot]