astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

Convert query_region to cone search

Open weaverba137 opened this issue 2 years ago • 4 comments

This PR closes #585.

query_region() now uses query_crossid() internally with slightly different parameters. Effectively, a circular/cone region around each coordinate is searched instead of a rectangle in previous versions.

In addition, this PR adds lots of updates to tests, both online and offline, and cleans up the documentation of several of the methods.

Other comments:

  • query_region() and query_crossid() accept astropy.coordinates.Angle for search radius as well as string or Quantity; internally, Angle is used to validate the search radius input and convert the angle to arcmin.
  • The maximum search radius imposed by SDSS is 3 arcmin. This is now enforced by the code.
  • Added DR17 to offline tests; it had already been enabled for online tests.

weaverba137 avatar Aug 02 '22 17:08 weaverba137

Codecov Report

Merging #2477 (db38571) into main (b1fcfff) will increase coverage by 0.10%. The diff coverage is 97.59%.

@@            Coverage Diff             @@
##             main    #2477      +/-   ##
==========================================
+ Coverage   62.92%   63.03%   +0.10%     
==========================================
  Files         133      133              
  Lines       17302    17301       -1     
==========================================
+ Hits        10888    10905      +17     
+ Misses       6414     6396      -18     
Impacted Files Coverage Δ
astroquery/sdss/core.py 92.06% <97.59%> (+6.15%) :arrow_up:

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

codecov[bot] avatar Aug 02 '22 18:08 codecov[bot]

This is a pretty big PR that looks good at a glance. @bsipocz, will you want to do a detailed review? I'm ready to approve this, but I haven't checked every line of the tests.

keflavich avatar Aug 02 '22 19:08 keflavich

Yes, I definitely want to review this with an ETA of maybe next week as this one is already been rather full.

bsipocz avatar Aug 02 '22 19:08 bsipocz

In regards keyword-only arguments: I have converted the entire module except for the get_images and get_spectra functions, which have a fairly complicated call signature. I think that is sufficient for the current PR, given that the original target of the PR was to convert to cone-search, not specifically to change to keyword-only arguments.

weaverba137 avatar Aug 03 '22 18:08 weaverba137

Thanks @weaverba137!

bsipocz avatar Sep 11 '22 04:09 bsipocz

After merging this PR, I've gone through and triaged the sdss labelled issues, and happy to say it has fixed a bunch of them. There are still some long-reported bugs, but after the triage, we have half the number of open issues than before!

🎉

bsipocz avatar Sep 11 '22 04:09 bsipocz

Thanks!

pllim avatar Sep 11 '22 16:09 pllim

Just out of curiosity, is it still possible to use astroquery to make larger radius queries similar to the box version previously used by astroquery.query_region?

You know how the saying goes with old code and user workflows…, but we had built an automated pipeline around large box size crossmatches 😅.

the limit of 3 arcmin is way too small and needs a hefty workaround based on SDSS fields..

we can currently pin the old astroquery version and patch in future fixes that don’t touch this code but it’s not ideal.

As a note from a user, it would have been nice to get a depreciation warning for an API breaking change like this, one that lasted a little while to give us time to adjust. It seems to me the switchover happened rapidly, and we have to do workarounds such as building the old functionality from the git history ourselves and patching in future fixes, or switch to our own SQL queries/casjobs.

One way to enable backwards compatibility is to rename the old version to something like query_rectangle_region or to add a parameter such as rectangular: boolean = False, that preserves the breakage while the deprecation warning is active.

Anyway, hope this can be considered.

emirkmo avatar Feb 08 '23 19:02 emirkmo

@emirkmo - you can write any custom SQL query, so I would suggest looking into using query_sql() for larger sized queries?

bsipocz avatar Feb 08 '23 19:02 bsipocz

Thanks for the advice. It is actually easy to solve as you say, but moving to a custom SQL query is exactly what I was hoping to avoid by using astroquery. I made an issue to track what I guess is actually a feature request to re-enable the rectangular search, as it has a tangible benefit, and the old implementation in astroquery from a few months ago still just works.

emirkmo avatar Feb 08 '23 20:02 emirkmo

I am also interested in this request because the 3 arcmin limit seems pretty arbitrary and I am not sure how to explain to users why suddenly we have this limit downstream.

pllim avatar Feb 08 '23 20:02 pllim

cc @weaverba137

bsipocz avatar Feb 08 '23 20:02 bsipocz

Frankly, we can still roll back that API change (it hasn't landed in a tagged release), but I would need a bit more feedback (doing the proper conesearch was raised as a request a few times in the past...)

cc @keflavich

bsipocz avatar Feb 08 '23 20:02 bsipocz

I've been listening in on this, and I'd like to hear @weaverba137 's take. I think rolling back the API change makes sense; probably best would be to allow either rectangular or circular searches, which is what we do with some other modules (vizier, I think? I'd have to dig more).

Generally I agree that, for something as basic as a rectangle or circle query, we want to provide the simplified user interface and avoid having users write SQL.

keflavich avatar Feb 08 '23 20:02 keflavich

OK, but here the underlying difference is huge. E.g. for the rectangular one we just wrote a brute force +/- ra/dec box, while the radius search is using a different server-side service.

bsipocz avatar Feb 08 '23 21:02 bsipocz

Frankly, we can still roll back that API change (it hasn't landed in a tagged release), but I would need a bit more feedback (doing the proper conesearch was raised as a request a few times in the past...)

cc @keflavich

I think the changes are actually very nice, especially with more proper reporting of errors and test failures. Run query also works as expected now given the "radius" argument. I would just appreciate having access to the rectangular search as well. I will leave any decisions up to the maintainers of course.

emirkmo avatar Feb 08 '23 21:02 emirkmo

I would see value in not having API change for the old setup, but then we run into the core of the previous issues, e.g. that query_region is not actually a cone search, but a box while a cone search is available. Adding a kwarg to fall back to the previous behaviour is still an API change, so while semantically would work, I'm looking for other options, too, maybe the we need a new function name for the current cone search-based implementation. (but I don't have any good ideas how to call it.)

bsipocz avatar Feb 08 '23 21:02 bsipocz

query_cone_region

query_rectangle_region

Is what I would look for in the docs (and first looked for).

Where either one or the other just calls query_region under the hood. That depends on whether backwards compatibility is more important or whether the function arguments making sense is more important, for query_region.

The docstring for either can point to each other and mention the different endpoints used and the radius limitation.

Thanks for the follow-up!

emirkmo avatar Feb 08 '23 21:02 emirkmo

Yeah, let's wait for Ben's take and then we'll figure out something that brings in the least amount of API change.

bsipocz avatar Feb 08 '23 21:02 bsipocz

If we do stick with the API change, we should include a deprecation warning that points users to the right way to do things.

keflavich avatar Feb 08 '23 21:02 keflavich

I'm starting to incline towards adding a kwarg to query_region that defaults to rectangular. Quite ugly, but that would be seamless, and no API change for all the previous use cases (I totally missed this server-side limitation on the query size, so the change doesn't feel like a simple backend refactoring any more to make the code more efficient and to fix a bug)

bsipocz avatar Feb 08 '23 21:02 bsipocz

Before I reply fully, I'm not entirely sure this is the thread we should continue the discussion on. For one, it's harder to find in the GitHub system since it's closed. Another open issue or PR might be easier for everyone to see.

weaverba137 avatar Feb 08 '23 22:02 weaverba137

Before I reply fully, I'm not entirely sure this is the thread we should continue the discussion on. For one, it's harder to find in the GitHub system since it's closed. Another open issue or PR might be easier for everyone to see.

I opened #2659

emirkmo avatar Feb 08 '23 22:02 emirkmo

Unless anyone objects, I am happy to move over to #2659.

weaverba137 avatar Feb 08 '23 22:02 weaverba137

Yes, please move this convo to the issue linked above!

bsipocz avatar Feb 08 '23 22:02 bsipocz