libadm icon indicating copy to clipboard operation
libadm copied to clipboard

document deepcopy is slow, especially with large documents

Open rsjbailey opened this issue 3 years ago • 4 comments

Document::deepCopy() performs poorly, particularly with large documents.

Two problems have been identified:

  • [ ] copy.cpp contains a lot of pass-by-value, passing std::map in this way in particular is expensive
  • [ ] adding an element to a document performs a duplicate check by doing a vector search (O(n)) for the element ID. During copy, this happens for every id in the original document, so copy is O(n^2) (or something approaching that, n grows as the document is copied)

The first is fairly trivial to fix. The second could be fixed by keeping document ids in a container with better than O(n) lookup (e.g. unordered_map) or maybe keeping them sorted and doing a binary search.

rsjbailey avatar May 05 '21 11:05 rsjbailey

I don't think keeping them sorted works unless you use a really fancy container, because insertion into a random access container is O(n), and binary search over a list is even worse.

tomjnixon avatar May 05 '21 12:05 tomjnixon

Looking at this a bit, it seems like the second problem only occurs for the top-level objects, not the audioBlockFormats, right?

There are not normally that many top-level elements, so it doesn't feel like an O(n^2) in there should be an issue unless each iteration is really inefficient.

tomjnixon avatar May 05 '21 12:05 tomjnixon

Yeah, I almost added a "it's not as bad as it looks" comment because of blockformats not being affected. For most 'normal' uses it's fine which is why it hasn't really cropped up, I think when I was testing it 500 objects was noticeably slow, which isn't a inconceivable number for more adventurous things.

The use case that originally brought it up was sADM, I think the prototype implementation was doing a copy every time a new block(?, not sure of the term) came in or something like that - possibly indirectly, I can't remember the specifics. Anyway, that meant it needed to be fast. Might be that a different approach to the sADM implementation would be a better solution, but copying a document feels like it shouldn't be a particularly heavyweight thing to do.

Good point on sorting vectors. Other alternatives that popped into my head include adding a batch add (so you could just sort & merge two vectors) or just disabling the check completely when doing a copy as if you're starting from a valid document and copying into an empty one, there shouldn't be any duplicates.

rsjbailey avatar May 05 '21 12:05 rsjbailey

Ah right. I'd still be tempted to benchmark it after sorting out the extra copies.

Otherwise I like the approach of just speeding up add, as that will help in other situations too.

tomjnixon avatar May 05 '21 13:05 tomjnixon