ArborX icon indicating copy to clipboard operation
ArborX copied to clipboard

Add a very light non-performant version of APIv1 back

Open aprokop opened this issue 6 months ago • 12 comments

Performant version of using PairValueIndex is really challenging to grasp initially. For new users, this introduces too steep of a learning curve, potentially making people give up.

Instead, it may make more sense to provide something much easier even if it does not have the best performance.

This PR introduces a new constructor to BVH and BruteForce:

Index(ExecutionSpace, int, indexable_getter)

This way, users will only have to write indexable getter, which should provide a lower barrier for entry. Because we won't store the leaf nodes' geometries, it will result in a penalty during the traversal.

I'd like to hear the thoughts.

aprokop avatar Jun 24 '25 20:06 aprokop

Since we had to reimplement Iota several times, it makes sense to move it to ArborX. I am also in favor of moving ExtractIndex to ArborX.

Because we won't store the leaf nodes' geometries, it will result in a penalty during the traversal.

Do we pay the penalty only on the leaf nodes? Is the penalty just that we need to access user data instead of access it directly inside ArborX?

Rombur avatar Jun 25 '25 12:06 Rombur

Since we had to reimplement Iota several times, it makes sense to move it to ArborX.

I'm not opposed to moving Iota to ArborX::. Why did you need to implement it?

Do we pay the penalty only on the leaf nodes? Is the penalty just that we need to access user data instead of access it directly inside ArborX?

Yes, only on the leaf nodes, internal nodes don't change. The penalty is due to both indirect reference and the fact that a user may be constructing geometric object on the fly which could be slow.

aprokop avatar Jun 25 '25 13:06 aprokop

I'm not opposed to moving Iota to ArborX::. Why did you need to implement it?

I meant moving Iota inside the library. It's fine in the current namespace.

The penalty is due to both indirect reference and the fact that a user may be constructing geometric object on the fly which could be slow.

I expect that the indirect reference would be small in most cases. If you are constructing object on the fly, it's fair to ask you to use a more advanced workflow.

It would be nice to run a benchmark to compare the performance difference but otherwise I like it.

Rombur avatar Jun 25 '25 13:06 Rombur

It would be nice to run a benchmark to compare the performance difference but otherwise I like it.

I did do that at some point (see here). Would need to update and rerun that benchmark and post the numbers.

aprokop avatar Jun 25 '25 14:06 aprokop

Here are the results on Frontier (attach meaning pair of value and index, iota meaning storing just integers and relying on indexable getter) for the benchmark (code is here):

-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
BM_benchmark_attach/100/manual_time                     0.355 ms        0.388 ms         1970 rate=281.455k/s
BM_benchmark_attach/10000/manual_time                   0.723 ms        0.757 ms          957 rate=13.8222M/s
BM_benchmark_attach/1000000/manual_time                  11.8 ms         12.1 ms           59 rate=84.8921M/s
BM_benchmark_attach/100000000/manual_time                1183 ms         1184 ms            1 rate=84.5039M/s
BM_benchmark_attach_callback/100/manual_time            0.164 ms        0.165 ms         4291 rate=608.884k/s
BM_benchmark_attach_callback/10000/manual_time          0.367 ms        0.368 ms         1911 rate=27.2469M/s
BM_benchmark_attach_callback/1000000/manual_time         6.64 ms         6.64 ms           94 rate=150.626M/s
BM_benchmark_attach_callback/100000000/manual_time        756 ms          756 ms            1 rate=132.298M/s
BM_benchmark_iota/100/manual_time                       0.368 ms        0.400 ms         1900 rate=272.071k/s
BM_benchmark_iota/10000/manual_time                     0.772 ms        0.807 ms          901 rate=12.9505M/s
BM_benchmark_iota/1000000/manual_time                    15.2 ms         15.5 ms           46 rate=65.8698M/s
BM_benchmark_iota/100000000/manual_time                  1696 ms         1696 ms            1 rate=58.9631M/s
BM_benchmark_iota_callback/100/manual_time              0.175 ms        0.175 ms         4027 rate=573.055k/s
BM_benchmark_iota_callback/10000/manual_time            0.393 ms        0.394 ms         1778 rate=25.421M/s
BM_benchmark_iota_callback/1000000/manual_time           8.72 ms         8.72 ms           80 rate=114.619M/s
BM_benchmark_iota_callback/100000000/manual_time          971 ms          971 ms            1 rate=102.955M/s

The difference grows with size. When storing the neighbor indices, the slowdown on 100M is 43%. For pure callbacks, it's 28%.

aprokop avatar Jun 26 '25 01:06 aprokop

This is more than I expected but it's still worth to have something like this available for new users. We should make it clear that by switching from the simplified to the advanced API, people can expect at least 25% but sometimes even more throughput even for simple cases. This way people can start with the simplified API. If they need better performance, they use the advanced API and they can expect substantial improvement (not a 5% difference).

Rombur avatar Jun 26 '25 13:06 Rombur

If they need better performance, they use the advanced API and they can expect substantial improvement (not a 5% difference).

I think it's really is case specific. It would depend on how many times each query hits a leaf, as well as the size of the problem. I can see it being anywhere from 5% to 50% improvement.

aprokop avatar Jun 26 '25 13:06 aprokop

My worry is that someone does a performance comparison between ArborX with the simple interface and another library. As long as we find a way to make that clear, I am fine with this interface.

Rombur avatar Jun 26 '25 13:06 Rombur

My worry is that someone does a performance comparison between ArborX with the simple interface and another library.

Yeah, that's my worry too. It's a tradeoff between making the first experience easy and making the first experience fast. Easy probably beats fast.

aprokop avatar Jun 26 '25 13:06 aprokop

The current situation is unacceptable. It's just too hard to try ArborX. We can add a note like Adios does

Rombur avatar Jun 26 '25 14:06 Rombur

@dalg24 What are your thoughts on this PR? Particularly, do you think it's a reach to provide second argument as int instead of making a user to spell out Iota?

aprokop avatar Jun 27 '25 15:06 aprokop

TODO: make a new construct a free function (outside of indexes)

aprokop avatar Jul 21 '25 18:07 aprokop