astroquery
astroquery copied to clipboard
Convert query_region to cone search
This PR closes #585.
query_region() now uses query_crossid() internally with slightly different parameters. Effectively, a circular/cone region around each coordinate is searched instead of a rectangle in previous versions.
In addition, this PR adds lots of updates to tests, both online and offline, and cleans up the documentation of several of the methods.
Other comments:
query_region()andquery_crossid()acceptastropy.coordinates.Anglefor search radius as well as string or Quantity; internally, Angle is used to validate the search radius input and convert the angle to arcmin.- The maximum search radius imposed by SDSS is 3 arcmin. This is now enforced by the code.
- Added DR17 to offline tests; it had already been enabled for online tests.
Codecov Report
Merging #2477 (db38571) into main (b1fcfff) will increase coverage by
0.10%. The diff coverage is97.59%.
@@ Coverage Diff @@
## main #2477 +/- ##
==========================================
+ Coverage 62.92% 63.03% +0.10%
==========================================
Files 133 133
Lines 17302 17301 -1
==========================================
+ Hits 10888 10905 +17
+ Misses 6414 6396 -18
| Impacted Files | Coverage Δ | |
|---|---|---|
| astroquery/sdss/core.py | 92.06% <97.59%> (+6.15%) |
:arrow_up: |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
This is a pretty big PR that looks good at a glance. @bsipocz, will you want to do a detailed review? I'm ready to approve this, but I haven't checked every line of the tests.
Yes, I definitely want to review this with an ETA of maybe next week as this one is already been rather full.
In regards keyword-only arguments: I have converted the entire module except for the get_images and get_spectra functions, which have a fairly complicated call signature. I think that is sufficient for the current PR, given that the original target of the PR was to convert to cone-search, not specifically to change to keyword-only arguments.
Thanks @weaverba137!
After merging this PR, I've gone through and triaged the sdss labelled issues, and happy to say it has fixed a bunch of them. There are still some long-reported bugs, but after the triage, we have half the number of open issues than before!
🎉
Thanks!
Just out of curiosity, is it still possible to use astroquery to make larger radius queries similar to the box version previously used by astroquery.query_region?
You know how the saying goes with old code and user workflows…, but we had built an automated pipeline around large box size crossmatches 😅.
the limit of 3 arcmin is way too small and needs a hefty workaround based on SDSS fields..
we can currently pin the old astroquery version and patch in future fixes that don’t touch this code but it’s not ideal.
As a note from a user, it would have been nice to get a depreciation warning for an API breaking change like this, one that lasted a little while to give us time to adjust. It seems to me the switchover happened rapidly, and we have to do workarounds such as building the old functionality from the git history ourselves and patching in future fixes, or switch to our own SQL queries/casjobs.
One way to enable backwards compatibility is to rename the old version to something like query_rectangle_region or to add a parameter such as rectangular: boolean = False, that preserves the breakage while the deprecation warning is active.
Anyway, hope this can be considered.
@emirkmo - you can write any custom SQL query, so I would suggest looking into using query_sql() for larger sized queries?
Thanks for the advice. It is actually easy to solve as you say, but moving to a custom SQL query is exactly what I was hoping to avoid by using astroquery. I made an issue to track what I guess is actually a feature request to re-enable the rectangular search, as it has a tangible benefit, and the old implementation in astroquery from a few months ago still just works.
I am also interested in this request because the 3 arcmin limit seems pretty arbitrary and I am not sure how to explain to users why suddenly we have this limit downstream.
cc @weaverba137
Frankly, we can still roll back that API change (it hasn't landed in a tagged release), but I would need a bit more feedback (doing the proper conesearch was raised as a request a few times in the past...)
cc @keflavich
I've been listening in on this, and I'd like to hear @weaverba137 's take. I think rolling back the API change makes sense; probably best would be to allow either rectangular or circular searches, which is what we do with some other modules (vizier, I think? I'd have to dig more).
Generally I agree that, for something as basic as a rectangle or circle query, we want to provide the simplified user interface and avoid having users write SQL.
OK, but here the underlying difference is huge. E.g. for the rectangular one we just wrote a brute force +/- ra/dec box, while the radius search is using a different server-side service.
Frankly, we can still roll back that API change (it hasn't landed in a tagged release), but I would need a bit more feedback (doing the proper conesearch was raised as a request a few times in the past...)
cc @keflavich
I think the changes are actually very nice, especially with more proper reporting of errors and test failures. Run query also works as expected now given the "radius" argument. I would just appreciate having access to the rectangular search as well. I will leave any decisions up to the maintainers of course.
I would see value in not having API change for the old setup, but then we run into the core of the previous issues, e.g. that query_region is not actually a cone search, but a box while a cone search is available. Adding a kwarg to fall back to the previous behaviour is still an API change, so while semantically would work, I'm looking for other options, too, maybe the we need a new function name for the current cone search-based implementation. (but I don't have any good ideas how to call it.)
query_cone_region
query_rectangle_region
Is what I would look for in the docs (and first looked for).
Where either one or the other just calls query_region under the hood. That depends on whether backwards compatibility is more important or whether the function arguments making sense is more important, for query_region.
The docstring for either can point to each other and mention the different endpoints used and the radius limitation.
Thanks for the follow-up!
Yeah, let's wait for Ben's take and then we'll figure out something that brings in the least amount of API change.
If we do stick with the API change, we should include a deprecation warning that points users to the right way to do things.
I'm starting to incline towards adding a kwarg to query_region that defaults to rectangular. Quite ugly, but that would be seamless, and no API change for all the previous use cases (I totally missed this server-side limitation on the query size, so the change doesn't feel like a simple backend refactoring any more to make the code more efficient and to fix a bug)
Before I reply fully, I'm not entirely sure this is the thread we should continue the discussion on. For one, it's harder to find in the GitHub system since it's closed. Another open issue or PR might be easier for everyone to see.
Before I reply fully, I'm not entirely sure this is the thread we should continue the discussion on. For one, it's harder to find in the GitHub system since it's closed. Another open issue or PR might be easier for everyone to see.
I opened #2659
Unless anyone objects, I am happy to move over to #2659.
Yes, please move this convo to the issue linked above!