astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

bug: MAST cloud data queries require column name change

Open bmorris3 opened this issue 1 year ago • 4 comments

I'm trying to get cloud URIs for MAST data products. I get an error with the code below, and I've included a (commented out) temporary workaround:

from astroquery.mast import Observations

# enable cloud data from AWS
Observations.enable_cloud_dataset()

# find data products
cubes = Observations.query_criteria(target_name='Io', proposal_id=1373, instrument_name='MIRI/IFU')

# # we actually need this line for the last command to work:
# cubes.rename_column('dataURL', 'dataURI')

# get URIs
Observations.get_cloud_uris(cubes)

Tagging @snbianco to see if swapping dataURI with dataURL here is the correct solution?: https://github.com/astropy/astroquery/blob/45b93c0c7013c48151a9319743934ec5c86ae876/astroquery/mast/cloud.py#L145

bmorris3 avatar Jan 03 '25 21:01 bmorris3

Hey Brett! The problem here is that Observations.get_cloud_uris expects a table of data products, not a table of observations. So, you could either add an intermediate step:

# find observations
cubes = Observations.query_criteria(target_name='Io', proposal_id=1373, instrument_name='MIRI/IFU')

# get products
products = Observations.get_product_list(cubes)

# get URIs
Observations.get_cloud_uris(products)

Or you could use the new streamlined workflow by supplying the query criteria directly to get_cloud_uris:

Observations.get_cloud_uris(target_name='Io', proposal_id=1373, instrument_name='MIRI/IFU')

I can see how that error message would be confusing, so I'll make a ticket to handle that exception better. I hope this all makes sense, and let me know if you have any more questions!

snbianco avatar Jan 03 '25 21:01 snbianco

Phew, ok thanks!

bmorris3 avatar Jan 03 '25 21:01 bmorris3

Actually @snbianco – I'm trying to load 4 IFU cubes in this example. My snippet above returns four MIRI/IFU data products (CH 1-4). Your revision returns 128 unique data products, including association files, jpg thumbnails, and higher level (derived) data products. These aren't identical.

bmorris3 avatar Jan 03 '25 23:01 bmorris3

Yep, there will normally be multiple data products associated with an observation. You'll want to use Observations.filter_products to pick out only the products you want:

# find observations
cubes = Observations.query_criteria(target_name='Io', proposal_id=1373, instrument_name='MIRI/IFU')

# get products
products = Observations.get_product_list(cubes)

# filter products by selecting only minimum recommended products and those with S3D sub group
filtered = Observations.filter_products(products, mrp_only=True, productSubGroupDescription='S3D')

# get URIs
Observations.get_cloud_uris(filtered)

You could also do this with the streamlined query by providing True to the mrp_only argument and a dictionary of column filters to the filter_products argument.

Observations.get_cloud_uris(target_name='Io', 
                            proposal_id=1373, 
                            instrument_name='MIRI/IFU',
                            mrp_only=True,
                            filter_products={'productSubGroupDescription': 'S3D'})

This should return the 4 URIs that you expect.

snbianco avatar Jan 04 '25 02:01 snbianco