kiddo_v1 A better PBC implementation

This incorporates some of the feedback you gave me on the last attempt. I had made a mess of my branch so I'm here with a new one and a new PR. This one has a much cleaner API and also has some correctness tests with brute force checks. I would like to continue adding a few more functions e.g. best_n_within, but for now nearest, nearest_one, within and within_unsorted work with periodic boundary conditions.

I noticed that your check_point suffered from the same ill so I changed it to allow early returns

    fn check_point(&self, point: &[A; K]) -> Result<(), ErrorKind> {
        for p in point {
            if !p.is_finite(){
                return Err(ErrorKind::NonFiniteCoordinate)
            }
        }

        if let Some(periodic) = self.periodic {
            for (idx, &p) in point.iter().enumerate() {
                if p < A::zero() || p > periodic[idx] {
                    return Err(ErrorKind::OutOfPeriodicBounds)
                }
            }
        }
        Ok(())
    }

Also I didn't get to thinking about how to serialize the Option, which is now an Option<&'a [A; K]> so that we don't have an identical copy of the array at every KdTree.

I wrote a benchmark with 1m query points and 100k data points and timed building and querying and got (non-pbc/pbc)

average: 26/26 millis to build, 85/92 millis to query

running on 48 threads. I can include this bench if you'd like. The build time is expected to be exactly the same because in this implementation the build is the same. The PBCs are handled by checking if it is necessary to apply PBCs (e.g. if your 1NN distance > distance to edge of box) and doing additional queries with only those relevant images.

Jul 04 '22 12:07 cavemanloverboy

Hiya - just thought I'd drop a quick comment to let you know I've not forgotten about this PR! I was exploring ways to see how it would fit in with a potential change to the library that I'm experimenting with that will allow users to choose from more than one underlying implementation that will all implement a KdTree trait. This way, we could have a Standard, Periodic, and an upcoming Index based internal representation that all expose the same trait, allowing people to switch between them.

This is proving to be quite a significant undertaking, so I've decided to get your changes in first and move back on to the Trait based approach afterwards.

I've got a few small changes I'd like to propose to your PBC PR though. I'll share them over the next day or two as soon as I've got all the tests passing, but the main aspects are:

removing periodic: Option<&'a [A; K]> from the KdTree struct. Since this periodic array is a reference to an array that exists outside of the KdTree struct, we can instead keep it out of the struct and just pass it as an argument to nearest_one_periodic and all of the other periodic-specific functions. This has a couple of advantages: The memory layout of the KdTree struct does not increase in size, keeping the current memory efficiency for users that aren't using PBC. Additionally, for users that are working with serialized trees that are loaded in from the filesystem, since the memory layout is the same they won't need to rebuild all their serialized datastructures when upgrading their version of Kiddo to one that includes these changes. (This can be quite a significant overhead - I'm working with a serialized tree containing tens of millions of items that would take me quite some time to recreate).
The test cases for PBC have quite complex code to check that the PBC behaviour works correctly. I've added a simpler, manually constructed test case where the expected results can be easily determined by simple manual calculations. I like your more heavyweight tests too, they're definitely a great addition, but especially for low level data structure libraries like this, it is good to have some more fundamental tests where the correctness of the library can be verified by hand, as a bit of an underlying sanity check.

Hopefully we can get this merged soon! Thanks again :-)

Jul 13 '22 06:07 sdd

Hi!

I've opened a second PR for this which contains all your commits from this PR plus one more of my own. I made a few modifications:

feat: PBC now expects the period as an argument when querying using PBC functions, rather than storing it on each node. This keeps the node size down and prevents the memory layout from changing with the introduction of PBC, which would force existing users to have to rebuild serialized trees if they wanted to upgrade.
tests: PBC tests refactored to move common brute force PBC check code and data/query/tree generating into separate function
tests: PBC tests use neater float asserts from approx package
tests: Rayon removed and PBC unit test query point and data point count reduced to improve unit test run time
style: re-ran cargo fmt and cargo clippy and fixed clippy lints that were introduced

Any chance you could take a look and see what you think?

Thanks!

Aug 10 '22 06:08 sdd

Hi!

I've opened a second PR for this which contains all your commits from this PR plus one more of my own. I made a few modifications:

feat: PBC now expects the period as an argument when querying using PBC functions, rather than storing it on each node. This keeps the node size down and prevents the memory layout from changing with the introduction of PBC, which would force existing users to have to rebuild serialized trees if they wanted to upgrade.

tests: PBC tests refactored to move common brute force PBC check code and data/query/tree generating into separate function

tests: PBC tests use neater float asserts from approx package

tests: Rayon removed and PBC unit test query point and data point count reduced to improve unit test run time

style: re-ran cargo fmt and cargo clippy and fixed clippy lints that were introduced

Any chance you could take a look and see what you think?

Thanks!

These all sound like great changes. Curious to see the performance. I'll check it out very soon.

Aug 14 '22 03:08 cavemanloverboy

kiddo_v1 kiddo_v1 copied to clipboard

A better PBC implementation

kiddo_v1
kiddo_v1 copied to clipboard