SDSS.query_region() returns misleading Table containing HTML error
When calling SDSS.query_region() with valid SkyCoord input, I occasionally receive what appears to be a successful astropy.table.Table response, but the contents are actually an HTML error page from the SDSS server. This causes downstream failures when passed to SDSS.get_spectra() (e.g., KeyError: 'run2d'), which expects real SDSS metadata. Here’s a minimal working example:
from astroquery.sdss import SDSS
from astropy.coordinates import SkyCoord
import astropy.units as u
coord = SkyCoord(ra=148.70583333*u.deg, dec=9.27108333*u.deg)
xid = SDSS.query_region(coord, radius=3*u.arcsec, spectro=True, data_release=18)
print(type(xid)) # <class 'astropy.table.table.Table'>
print(xid) # Contains an HTML error page
print(xid.colnames) # ['<html><head>...']
Expected behavior: If the request failed, xid should be None or an Exception should be raised.
Actual behavior: An astropy.table.Table is returned, but with HTML content instead of usable columns like 'plate', 'fiberID', 'mjd', etc. This leads to confusing downstream errors.
Suggested improvement: Detect HTML responses and raise a proper error (e.g., RemoteServiceError) instead of wrapping them in a Table. Alternatively, return None or a Table with metadata indicating the failure.
Thanks for maintaining Astroquery — this would really help users handle these cases more cleanly!
cc @weaverba137
I tried to reproduce this today:
>>> from astroquery.sdss import SDSS
>>> from astropy.coordinates import SkyCoord
>>> import astropy.units as u
>>> coord = SkyCoord(ra=148.70583333*u.deg, dec=9.27108333*u.deg)
>>> xid = SDSS.query_region(coord, radius=3*u.arcsec, spectro=True, data_release=18)
>>> xid
<Table length=1>
ra dec objid run rerun camcol field z plate mjd fiberID specobjid run2d
float64 float64 uint64 int64 int64 int64 int64 float64 int64 int64 int64 uint64 int64
---------------- ---------------- ------------------- ----- ----- ------ ----- ----------- ----- ----- ------- ------------------- -----
148.706500405545 9.27107701783245 1237661063879000073 3630 301 1 215 0.004862683 1306 52996 5 1470426702990567424 26
I suspect a transient server error possibly due to maintenance, or heavy load.
Thank you for checking!
I still see a docu-error handling todo here for astroquery; to capture a somewhat descriptive error instead of the KeyError. Not sure if it's possible without overcomplicating the code.
I agree, but it is tricky. First off, if you can't reproduce the error message returned because the server is actually working, it's going to be hard to intercept. I also don't remember if HTML is what is returned when the query is successful, so interpreting all HTML as an error wouldn't necessarily work either.
I have the full traceback from Jessica in our internal slack/fornax, so will look a bit into the details if there is any low hanging fruits.
https://github.com/nasa-fornax/fornax-demo-notebooks/issues/437
I can confirm that this bug is present today.
When querying a position in SDSS, there are intermittent service interruptions (presumably from the SDSS side) that are returned by astroquery (v0.4.7) as astropy Tables.
from astroquery.sdss import SDSS
from astropy import units
from astropy.coordinates import SkyCoord
sdss_result = SDSS.query_region(SkyCoord(227.76552, 36.5820, unit=(units.deg,units.deg)), radius=0.05*units.deg, spectro=False, photoobj_fields=['objID', 'ra', 'dec'], timeout=180)
Approximately 1 in every 20 times the output is an astropy.table.table.Table object, with columns <TableColumns names=('<html><body><h1>502 Bad Gateway</h1>')>
<html><body><h1>502 Bad Gateway</h1>
------------------------------------------------------
The server returned an invalid or incomplete response.
</body></html>
Locally I'm able to work around it by clunkily testing that output object contains a column matching one of the requested fields.
if (sdss_result is not None):
elif ('objID' in sdss_result.columns):
....
Maybe that would be a good solution to identify this problem internally?
Thanks for this great service :)