Differ icon indicating copy to clipboard operation
Differ copied to clipboard

High Memory Usage extended diffing two arrays of quite differently ordered results

Open tysonkerridge opened this issue 3 years ago • 0 comments

Hello, I've come across a particular case where I'm seeing a really high memory usage and wanted to make sure I wasn't doing something wrong or extended diffing in some unintended way.

I was extended diffing two seemingly small arrays of items which happened to be the same (or very similar, give or take a few items) but in two different orders. Though my data was from a different source, I've reproduced the issue with the following example but just creating an array of 5,000 random Ints, uniquing and then shuffling the first array of items to create the second set of items to diff.

DispatchQueue(label: "test").async {
   let pool = (0...999_999)
   let getRandom: () -> [Int] = {
       (0..<5_000).map { _ in
           pool.randomElement()!
       }
   }
   let previous = Set(getRandom()).shuffled()
   let current = previous.shuffled()
   let before = Date()
   let _ = previous.extendedDiff(current)
   print("finished. took:", Date().timeIntervalSince(before))
}

Running this code appears to reach a memory usage of around ~1GB, and take ~27 seconds. For reference, 10,000 items seems to hit ~6GB and take ~111 seconds.

Also for reference, using two random sets of 10,000 items also hits about ~5GB and takes ~113 seconds. eg.:

let previous = Set(getRandom()).sorted()
let current = Set(getRandom()).sorted()

The reason this is a problem is that iOS (or at least debugging device in Xcode) seems to have a memory limit of about 2GB before you get terminated.

In my case, all I really care about is the deleted indexes and not the order, so I've found that in the case where the two arrays are very similar but in different orders, if I sort both arrays of items beforehand so that they're as similar as possible and then run the diff, it takes negligible memory and finishes in no time at all even with 10,000 items. Unfortunately in the case of two random sets of items, sorting won't help and will hit those limits/take a while.

I just wanted to make sure that I'm not doing something else wrong here, or if there's a different approach I should be taking for extended diffing two sets of items that could for all I know be two sets of different items.

Edit: Clarify that I was extended diffing (both seem to be problematic, but still)

tysonkerridge avatar Apr 21 '21 04:04 tysonkerridge