astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

SDSS.query_region() returns misleading Table containing HTML error

Open jkrick opened this issue 5 months ago • 7 comments

When calling SDSS.query_region() with valid SkyCoord input, I occasionally receive what appears to be a successful astropy.table.Table response, but the contents are actually an HTML error page from the SDSS server. This causes downstream failures when passed to SDSS.get_spectra() (e.g., KeyError: 'run2d'), which expects real SDSS metadata. Here’s a minimal working example:

from astroquery.sdss import SDSS
from astropy.coordinates import SkyCoord
import astropy.units as u

coord = SkyCoord(ra=148.70583333*u.deg, dec=9.27108333*u.deg)
xid = SDSS.query_region(coord, radius=3*u.arcsec, spectro=True, data_release=18)

print(type(xid))       # <class 'astropy.table.table.Table'>
print(xid)             # Contains an HTML error page
print(xid.colnames)    # ['<html><head>...']

Expected behavior: If the request failed, xid should be None or an Exception should be raised.

Actual behavior: An astropy.table.Table is returned, but with HTML content instead of usable columns like 'plate', 'fiberID', 'mjd', etc. This leads to confusing downstream errors.

Suggested improvement: Detect HTML responses and raise a proper error (e.g., RemoteServiceError) instead of wrapping them in a Table. Alternatively, return None or a Table with metadata indicating the failure.

Thanks for maintaining Astroquery — this would really help users handle these cases more cleanly!

jkrick avatar Jul 10 '25 23:07 jkrick

cc @weaverba137

bsipocz avatar Jul 12 '25 22:07 bsipocz

I tried to reproduce this today:

>>> from astroquery.sdss import SDSS
>>> from astropy.coordinates import SkyCoord
>>> import astropy.units as u
>>> coord = SkyCoord(ra=148.70583333*u.deg, dec=9.27108333*u.deg)
>>> xid = SDSS.query_region(coord, radius=3*u.arcsec, spectro=True, data_release=18)
>>> xid
<Table length=1>
       ra              dec               objid         run  rerun camcol field      z      plate  mjd  fiberID      specobjid      run2d
    float64          float64             uint64       int64 int64 int64  int64   float64   int64 int64  int64         uint64       int64
---------------- ---------------- ------------------- ----- ----- ------ ----- ----------- ----- ----- ------- ------------------- -----
148.706500405545 9.27107701783245 1237661063879000073  3630   301      1   215 0.004862683  1306 52996       5 1470426702990567424    26

I suspect a transient server error possibly due to maintenance, or heavy load.

weaverba137 avatar Jul 14 '25 14:07 weaverba137

Thank you for checking!

bsipocz avatar Jul 14 '25 18:07 bsipocz

I still see a docu-error handling todo here for astroquery; to capture a somewhat descriptive error instead of the KeyError. Not sure if it's possible without overcomplicating the code.

bsipocz avatar Jul 14 '25 18:07 bsipocz

I agree, but it is tricky. First off, if you can't reproduce the error message returned because the server is actually working, it's going to be hard to intercept. I also don't remember if HTML is what is returned when the query is successful, so interpreting all HTML as an error wouldn't necessarily work either.

weaverba137 avatar Jul 14 '25 19:07 weaverba137

I have the full traceback from Jessica in our internal slack/fornax, so will look a bit into the details if there is any low hanging fruits.

https://github.com/nasa-fornax/fornax-demo-notebooks/issues/437

bsipocz avatar Jul 14 '25 19:07 bsipocz

I can confirm that this bug is present today.

When querying a position in SDSS, there are intermittent service interruptions (presumably from the SDSS side) that are returned by astroquery (v0.4.7) as astropy Tables.

from astroquery.sdss import SDSS
from astropy import units
from astropy.coordinates import SkyCoord
sdss_result = SDSS.query_region(SkyCoord(227.76552, 36.5820, unit=(units.deg,units.deg)), radius=0.05*units.deg, spectro=False, photoobj_fields=['objID', 'ra', 'dec'], timeout=180)

Approximately 1 in every 20 times the output is an astropy.table.table.Table object, with columns <TableColumns names=('<html><body><h1>502 Bad Gateway</h1>')>

         <html><body><h1>502 Bad Gateway</h1>         
------------------------------------------------------
The server returned an invalid or incomplete response.
                                        </body></html>

Locally I'm able to work around it by clunkily testing that output object contains a column matching one of the requested fields.

if (sdss_result is not None):
     elif ('objID' in sdss_result.columns):
         ....

Maybe that would be a good solution to identify this problem internally?

Thanks for this great service :)

MatSmithAstro avatar Oct 04 '25 14:10 MatSmithAstro