SOLR-16292 : NVector for alternative great-circle distance calculations
https://issues.apache.org/jira/browse/SOLR-16292
Description
N-Vector is a three-parameter representation that can be used to calculate the great-circle distance (assuming a spherical Earth).
It uses FastInvTrig, which for small distances it compares well with Math.acos. NVector is a faster way of calculating the great-circle distance than Haversine.
Solution
Store N-Vectors in solr index via CoordinateFieldType with 3 values for the nvector into single value double subfields, use java Math class for indexing these Use an maclaurin approximation for acos for calculating great-circle distance at query-time via a function query
Tests
Tests for the FastInvTrigTest impl to compare it's acos with Math.acos. Tests for NVectorUtil, NVectorDist , latLongToNVector and NVectorToLatLong Tests for indexing N-Vectors and calculating the great-circle distance via the function query.
Checklist
Please review the following and check all that apply:
- [x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request title.
- [x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the
mainbranch. - [ ] I have run
./gradlew check. - [x] I have added tests for my changes.
- [ ] I have added documentation for the Reference Guide
please also run ./gradlew tidy to make sure that your code adhered to our formatting conventions. :)
I found your source FastInvTrig repo and ran the benchmarks from there, with a few tweaks to improve measurement accuracy and also added Lucene's SloppyMath into the competition.
My results are directionally the same as yours -
Benchmark Mode Cnt Score Error Units
FastInvTrigBenchmark.acosBM avgt 3 17.576 ± 1.243 ns/op
FastInvTrigBenchmark.fastMathAcosBM avgt 3 87.301 ± 4.303 ns/op
FastInvTrigBenchmark.haversineBM avgt 3 95.062 ± 17.137 ns/op
FastInvTrigBenchmark.mathAcosBM avgt 3 140.391 ± 5.508 ns/op
FastInvTrigBenchmark.sloppyHaversineMeters avgt 3 46.647 ± 8.927 ns/op
FastInvTrigBenchmark.sloppyHaversineSortKey avgt 3 33.637 ± 6.364 ns/op
One concern for comparison that I have would be that the current Lucene implementation (which is only 2x slower than yours) has an upper error bound of 40cm. I think we can get there with your series expansions by adding additional terms, but I don't remember my calculus well enough to know how many we need. If we do that, how much does performance suffer? Are we still competitive? You're currently at 10m, which I suspect might be too large of a delta to be useful for the applications that I am familiar with.
Thanks for looking into this more @madrob .
I wasn't aware of SloppyMath. If we assume Math.acos correct, then to get the same accuracy with FastInvTrig to within 40cm, I've tested it takes 17 terms, which after running a benchmark means that:
Benchmark (num_points) Mode Cnt Score Error Units
FastInvTrigBenchmark.acosBM10 2000000 avgt 2 44.149 ms/op
FastInvTrigBenchmark.acosBM17 2000000 avgt 2 67.977 ms/op
FastInvTrigBenchmark.fastMathAcosBM 2000000 avgt 2 252.578 ms/op
FastInvTrigBenchmark.haversineBM 2000000 avgt 2 288.209 ms/op
FastInvTrigBenchmark.mathAcosBM 2000000 avgt 2 300.509 ms/op
FastInvTrigBenchmark.mathAsin 2000000 avgt 2 320.407 ms/op
FastInvTrigBenchmark.sloppyAsin 2000000 avgt 2 50.152 ms/op
FastInvTrigBenchmark.sloppyHaversinMeters 2000000 avgt 2 134.309 ms/op
FastInvTrigBenchmark.sloppyHaversinSortKey 2000000 avgt 2 82.845 ms/op
So then:
SloppyMath.HaversinSortKey is 1.9x slower than FastInvTrig with 10 terms (as you found) SloppyMath.HaversinSortKey is 1.2x slower than FastInvTrig with 17 terms
However I also noticed that SloppyMath has an asin implementation, and with pi/2-asin(x) = acos(x)
https://www.wolframalpha.com/input?i2d=true&i=%5C%2840%29Divide%5Bpi%2C2%5D-asin%5C%2840%29x%5C%2841%29%5C%2841%29-acos%5C%2840%29x%5C%2841%29
after adding this into the benchmark, using nvector and this identity ^ for acos then:
FastInvTrig with 17 terms is 1.3x slower than SloppyMath.asin !
The caveat is that SloppyMath.asin uses more memory for caching values I think.
So I'm wondering now whether to abandon the FastInvTrig series expansion, and use SloppyMath.asin for NVector? What do you think ?
I'm a little confused as to what the sloppyAsin results are...
From your tests, we should still prefer NVector with MacLaurian expansion at 17 terms over using Sloppy Haversine, right? Is Sloppy Asin using NVector with no series expansion for the trig and instead the lookup tables from Sloppy Math? Which gives us the 40cm accuracy at almost the original 10 term expansion performance level?
Can we do a similar trick to split the calculation to get an n-vector sort key and an n-vector meters and get even more speedup for the cases where we don't care about absolute distances?
I'm a little confused as to what the sloppyAsin results are...
From your tests, we should still prefer NVector with MacLaurian expansion at 17 terms over using Sloppy Haversine, right? Is Sloppy Asin using NVector with no series expansion for the trig and instead the lookup tables from Sloppy Math? Which gives us the 40cm accuracy at almost the original 10 term expansion performance level?
We calculate the Great circle distance as d=R*acos(a.b) where d = distance, R = radius, and a,b are NVectors (a.b is the scalar dot product)
We also know acos(x) = pi/2-asin(x)
So I compared FastInvTrig.acos (with 17 terms) with SloppyMath.asin, and found that FastInvTrig.acos with 17 terms is 1.3x slower than SloppyMath.asin.
This is what I meant with FastInvTrigBenchmark.sloppyAsin
so SloppyMath.asin, at the required precision, is faster than FastInvTrig.acos,
So I was thinking of abandoning FastInvTrig.acos in favour of SloppyMath.asin, what do you think?
Can we do a similar trick to split the calculation to get an n-vector sort key and an n-vector meters and get even more speedup for the cases where we don't care about absolute distances?
Yes! From looking at the acos plot ( https://www.wolframalpha.com/input?i2d=true&i=acos%5C%2840%29x%5C%2841%29 ) it's a 1 to 1 function, so well suited for comparison. I did a quick test which confirmed that the dot product is enough for comparison between values (which is all we need to do, as we cache the nvectors in the NVectorField). So we then have the following benchmark (with NVectorSortKey as this comparison and the fastest) :
Benchmark (num_points) Mode Cnt Score Error Units
FastInvTrigBenchmark.NVectorSortKey 2000000 avgt 2 13.846 ms/op
FastInvTrigBenchmark.acosBM10 2000000 avgt 2 45.620 ms/op
FastInvTrigBenchmark.acosBM17 2000000 avgt 2 68.673 ms/op
FastInvTrigBenchmark.fastMathAcosBM 2000000 avgt 2 249.075 ms/op
FastInvTrigBenchmark.haversineBM 2000000 avgt 2 289.814 ms/op
FastInvTrigBenchmark.mathAcosBM 2000000 avgt 2 304.411 ms/op
FastInvTrigBenchmark.mathAsin 2000000 avgt 2 323.126 ms/op
FastInvTrigBenchmark.sloppyAsin 2000000 avgt 2 49.749 ms/op
FastInvTrigBenchmark.sloppyHaversinMeters 2000000 avgt 2 135.847 ms/op
FastInvTrigBenchmark.sloppyHaversinSortKey 2000000 avgt 2 82.149 ms/op
So in Solr perhaps we can use NVectorSortKey for sort comparisons then.
Ok, I get it now. Yes, let's do the SloppyMath.asin approach, splitting out the sort key (I would call it dot product and add comments about why we can use it as sort key instead of calling the method SortKey)
- SloppyMath.asin now replaces FastInvTrig.acos
- Sorting in the function query is on the dot product, values still fetch distance
I think the core of the idea is good here. Big thing missing is an update to the ref guide, probably function-queries.adoc or spatial-search.adoc?
When i get a moment I'll ad something to spatial-search.adoc perhaps?
Another thought I had was how exactly do we expect users to use this. If they're still going to be providing indexable data in lat/long and also expecting lat/long for output information, then will this really be faster than using haversine? Or does it move the computation to the indexing side when we only have to do it once, so over multiple queries the total time taken gets reduced...
This moves most of the calculation to the indexing side. We then only need to calculate an n-vector for the input lat/lon. The sorting can then be done on the dot-product alone. Users can optionally index spatial data in multiple formats (e.g. LanLonSpatialField and NVectorField) should they find that a performance boost. N-Vector provides faster comparison (and great circle distance calculation) than haversine, additionally without caveats, or accuracy degradation, for calculations at poles/equator etc.
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!