astropy-healpix
astropy-healpix copied to clipboard
Algorithmic complexity of healpix_rangesearch() is way too high
I just tried to run this simple cone search:
int *ind = NULL;
int npix = healpix_rangesearch_radec_simple(1.5, 0., 1.5, 128, &ind);
When compiling the C library with "-O2", this computation took 40 seconds on my laptop, which is way off the scale; expected run times should be below a millisecond.
From looking at the sources, I fear that the run time of healpix_rangesearch
is not only scaling with the number of returned pixels, but actually with its square (this is caused by the calls to ll_contains
, which have linear complexity by themselves). This won't do for any application where the cone search returns more than a handful of pixels.
cc @dstndstn
This code was written for small Nside (=1 or 2!), so there were never a large number of pixels in our application.
Without changing the algorithm, you could try:
change ll_contains to ll_sorted_contains change ll_append to ll_insert_ascending
this "ll" data structure is a block-list: linked-list of arrays, so the sorted inserts and checks can jump by the array size (256 here) rather than iterating through the whole list. Technically doesn't improve the complexity, but it should improve the speed!
The Astrometry.net code does also include the "bt.c" code, which implement a similar API but uses an AVL tree under the hood, so should yield fast inserts and searches. However, it is derived from GNU libavl, so probably won't work for your licensing situation.
More broadly: the algorithm here is expanding a frontier, finding healpix neighbours and checking them individually. Obviously, as Nside increases, this is not a good idea. As Nside increases, it becomes more like a regular grid, and you could use a different approach, say finding the start and end pixels within each row inside each base healpixel.
Oh, I had assumed that this code was supposed to be the equivalent of query_disc()!
The proposed change to sorted lists will actually reduce the complexity, as far as I understand: from O(n**2) to O( n log(n)), if n is the number of returned pixels.
However it might be worth thinking about going all the way and reach a complexity of O(sqrt(n)). Here is a rough sketch for the NESTED scheme:
- put all 12 base pixels in a work list
- while the list is not empty:
- take the next pixel from the list
- if it is completely outside the disk, do nothing
- if it is completely inside, add the range of its subpixels to the output
- otherwise (i.e. the pixel partially overlaps the disk) if we are at the desired resolution, add the pixel to the output, else add its four subpixels at the next refinement level to the work list.
No, complexity is still O(N**2) because the (block-list versions of) insert_ascending and sorted_contains are still O(N); the top-level structure is a linked list, so can't be binary searched.
For the RING scheme, you already proposed an efficient way: go through all Healpix rings touching the disk, determine which range of pixels in the ring overlaps with the disk, and append it to the output.
These range-based approaches are important for catalog cross-matching etc. In these applications one often needs cone searches at very high resolutions at not-too-small radii, so storing explicit pixel lists becomes prohibitively expensive.
Sorry, you are of course correct about the complexity staying O(n**2)!