astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

MAST query times out after 600s

Open npirzkal opened this issue 3 years ago • 1 comments

The default config file does not seem to have an entry for [mast] and conf.timeout is set by mast/init.py to a hard coded 600s. This value is way to small to do any JWST related MAST queries. As this is not well documented, I wanted to report this here.

npirzkal avatar Jul 22 '22 15:07 npirzkal

The __init__() file defines the default value for the configuration item. The configuration file does contain a section for mast, and setting the timeout item in that file takes precedence over the default value.

You might find it helpful to run

>>> from astropy.config import create_config_file
>>> create_config_file("astroquery")

You might have to call create_config_file("astroquery", overwrite=True) instead, but be aware that this will restore default values everywhere.

eerovaher avatar Jul 22 '22 16:07 eerovaher

I think this is essentially a problem of how MAST has structured its database and tables (and possibly which columns are indexed), not necessarily a problem with astroquery. For JWST data, when one does:

from astroquery.mast import Observations

obs_table = Observations.query_criteria(proposal_id="1345", instrument_name="nircam")
products = Observations.get_product_list(obs_table)

the MAST database has to do a table join of maybe a quarter million items. JWST data just has lots of products for each observation, especially in the case of the spectral modes, particularly MOS and grism. And part of this is caused by the fact that 2 "observations" in MAST means 2 objects which may come from the same original dataset, i.e. _uncal.fits file. And both observations will refer to a common product, so there are massive duplicates for these modes.

So the workaround is the query one observation at a time, each returning a table into a list, then vstack the tables into a final table which can then be filtered and used for downloading.

from astroquery.mast import Observations
from astropy.table import vstack

obs_table = Observations.query_criteria(proposal_id="1345", instrument_name="nircam")
product_list = [Observations.get_product_list(obs) for obs in obs_table]
products = vstack(product_list)

You may think looping through observations is slow, but it is almost always faster then asking the tables to be joined in the database on the server side. And you don't get timeouts.

As for the duplicates, these are now culled in astroquery, just as they are if you use the MAST portal.

jdavies-st avatar Oct 13 '22 09:10 jdavies-st