libadm
libadm copied to clipboard
document deepcopy is slow, especially with large documents
Document::deepCopy() performs poorly, particularly with large documents.
Two problems have been identified:
- [ ] copy.cpp contains a lot of pass-by-value, passing std::map in this way in particular is expensive
- [ ] adding an element to a document performs a duplicate check by doing a vector search (O(n)) for the element ID. During copy, this happens for every id in the original document, so copy is O(n^2) (or something approaching that, n grows as the document is copied)
The first is fairly trivial to fix. The second could be fixed by keeping document ids in a container with better than O(n) lookup (e.g. unordered_map) or maybe keeping them sorted and doing a binary search.
I don't think keeping them sorted works unless you use a really fancy container, because insertion into a random access container is O(n), and binary search over a list is even worse.
Looking at this a bit, it seems like the second problem only occurs for the top-level objects, not the audioBlockFormats, right?
There are not normally that many top-level elements, so it doesn't feel like an O(n^2) in there should be an issue unless each iteration is really inefficient.
Yeah, I almost added a "it's not as bad as it looks" comment because of blockformats not being affected. For most 'normal' uses it's fine which is why it hasn't really cropped up, I think when I was testing it 500 objects was noticeably slow, which isn't a inconceivable number for more adventurous things.
The use case that originally brought it up was sADM, I think the prototype implementation was doing a copy every time a new block(?, not sure of the term) came in or something like that - possibly indirectly, I can't remember the specifics. Anyway, that meant it needed to be fast. Might be that a different approach to the sADM implementation would be a better solution, but copying a document feels like it shouldn't be a particularly heavyweight thing to do.
Good point on sorting vectors. Other alternatives that popped into my head include adding a batch add (so you could just sort & merge two vectors) or just disabling the check completely when doing a copy as if you're starting from a valid document and copying into an empty one, there shouldn't be any duplicates.
Ah right. I'd still be tempted to benchmark it after sorting out the extra copies.
Otherwise I like the approach of just speeding up add
, as that will help in other situations too.