openvdb icon indicating copy to clipboard operation
openvdb copied to clipboard

Improved performance of LeafNode ValueIters

Open Idclip opened this issue 8 months ago • 3 comments

Idclip avatar Apr 06 '25 06:04 Idclip

I've thoroughly tested this with GCC 11 and GCC 13 today. I found substantial improvements in performance of around 2x using both compilers, this is fantastic! However, I couldn't detect a difference in performance when using the assume optimization on either compiler, possibly I have some compiler flags that conflict or something about the environment I am testing in. It may be more beneficial on clang perhaps.

I also confirmed what you said last time that the potential deallocation issue is the same as the current implementation, so no need to discuss that any further.

I did notice one thing that I wanted to highlight. The iterators are currently default initialized or initialized with a leaf node pointer. If you default initialize or initialize with a nullptr instead of a leaf node pointer, it now attempts to deference the nullptr whereas in the previous implementation, it would call parent() which threw an exception if the parent is a nullptr. I'm not sure what the correct solution is here, but at least asserting the parent pointer before dereferencing would help track down a potential issue here? (I do think we should change the logic here to expect that the parent node is not a nullptr on initialization). To reproduce:

openvdb::FloatTree::LeafNodeType::ValueOnIter iter(leaf.valueMask().beginOn(), nullptr);
iter.setValue(1.0f);

I also wanted to mention that I found a bit more performance on both compilers by breaking the leaf node iterator logic out into a separate header (tree/LeafIterator.h) and collapsing the layers of hierarchy (iteratorbase->sparseiteratorbase->valueiter) down into a single iterator class. Presumably the compiler has more opportunity to optimize with a bit less complexity in the design. It might be worth us considering this.

danrbailey avatar Apr 23 '25 00:04 danrbailey

However, I couldn't detect a difference in performance when using the assume optimization on either compiler, possibly I have some compiler flags that conflict or something about the environment I am testing in. It may be more beneficial on clang perhaps.

To get the optimizer to trigger in the way I wanted was very temperamental. I also experimented splitting out the LeafBuffer into two, ABI identical implementations; one for delay loading, one without. This also triggered the optimization I wanted but was obviously a larger change set. I am not super happy about the assume, it is a bit awkward, but I don't think its a bad solution in lieu of more refactoring.

I also confirmed what you said last time that the potential deallocation issue is the same as the current implementation, so no need to discuss that any further.

Thoughts on the impl of swap()? I could deprecate this and introduce an unsafeSwap

I'm not sure what the correct solution is here, but at least asserting the parent pointer before de-referencing would help track down a potential issue here? (I do think we should change the logic here to expect that the parent node is not a nullptr on initialization).

Can definitely add an assert. I can also deprecate that constructor and introduce once that takes a reference. Ideally want to remove the default constructor as well, but not 100% sure of the consequences of that yet.

I also wanted to mention that I found a bit more performance on both compilers by breaking the leaf node iterator logic out into a separate header (tree/LeafIterator.h) and collapsing the layers of hierarchy (iteratorbase->sparseiteratorbase->valueiter) down into a single iterator class.

Did you try adding final to the the ValueIter? I bet LTO/IPO would help heaps here too. But yeah, this is similar to the code reduction and performance improvements I observed with splitting out delay loading code from the LeafBuffer - a lot is ifdef'ed out but some isn't and it is having an impact.

Idclip avatar Apr 23 '25 04:04 Idclip

Thoughts on the impl of swap()? I could deprecate this and introduce an unsafeSwap

I don't think that's necessary. It's only different in a very niche case when you have a value iterator pointing at a leaf and you do a swap on the leaf and then try and access the value iterator. It's very common to expect the iterator itself to be left in an invalid state if you change the underlying data in any way, so I don't think we need to do anything special to handle this change in behavior.

Can definitely add an assert. I can also deprecate that constructor and introduce once that takes a reference. Ideally want to remove the default constructor as well, but not 100% sure of the consequences of that yet.

Yeah, the assert is better than nothing, would be good to add that in.

Did you try adding final to the the ValueIter? I bet LTO/IPO would help heaps here too. But yeah, this is similar to the code reduction and performance improvements I observed with splitting out delay loading code from the LeafBuffer - a lot is ifdef'ed out but some isn't and it is having an impact.

No, I didn't try that, but we should definitely put final anywhere that we can.

I'm going to approve regardless, it would be nice to add that assert, but that's a minor comment, it looks good to me! Thanks for doing this. I have some other performance improvements I wanted to push up, I suppose I should re-run my benchmarks to see if they are still faster after this change. :)

danrbailey avatar May 10 '25 04:05 danrbailey