astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

ALMA query returning minimally different duplicate entries for programs

Open privong opened this issue 4 years ago • 4 comments

I am not sure if this is a bug or if I am misunderstanding something about what astroquery is retrieving/returnhing from the ALMA archive results. In essence, for queries done with astroquery.alma (either on a target name or a position) I am receiving 4x as many rows as the ALMA archive web interface returns.

For example, using the web interface for target 'NGC 7552' returns 3 observations (member ous IDs of: uid://A001/X1320/X62, uid://A001/X133d/X18ca, and uid://A001/X133d/X1871).

However an astroquery search returns more rows (12) and when considering a single observation ID / member ous ID, the only differences in the rows are between in these columns: em_min, em_max, em_res_power, and sensitivity_10kms:

In [1]: from astroquery.alma import Alma

In [2]: res = Alma.query_object('NGC 7552')

In [3]: len(res)
Out[3]: 12

In [4]: p1 = 'uid://A001/X133d/X1871'

In [5]: limobs = res[res['obs_id']==p1]

In [6]: for col in limobs.columns:
   ...:     entries = list(set(limobs[col]))
   ...:     if len(entries) > 1:
   ...:         print(col, "has unique entries")
   ...:         print(limobs[col])
   ...:
em_min has unique entries
        em_min
          m
---------------------
0.0006131220609530008
0.0006112569467662004
0.0006265035056818631
0.0006284629723281648
em_max has unique entries
        em_max
          m
---------------------
0.0006156203061267376
0.0006137399849825776
0.0006291122217032601
0.0006310880662145999
em_res_power has unique entries
   em_res_power
------------------
15584.239401244233
15631.984986636786
15313.520425563484
15265.774840170934
sensitivity_10kms has unique entries
sensitivity_10kms
    mJy / beam
------------------
 506.9967995761073
192.70515542195972
 99.49826361572201
 207.9495367122564

The ALMA archive query reports the smallest of the sensitivity_10kms values.

Are these effectively the values for the different spectral windows within the observation and this is reporting? And is merely passing along what the ALMA archive reports?

privong avatar Sep 24 '21 15:09 privong

This seems like an ALMA archive question; I don't think there's any reason astroquery would return something different than what's in the archive, but I find it a little concerning, especially given that this seems to be the same observation reported 4 times.

keflavich avatar Sep 24 '21 17:09 keflavich

I discussed this a bit with @alipnick and he confirmed that the rows correspond to individual spectral windows. This can be see in the archive result for the member OUS used above. Hovering over the "Frequency Support" result shows the same sensitivity values as I copied above. Or directly following from astroquery example I pasted initially:

In [12]: list(set(limobs['frequency_support']))
Out[12]: ['[475.04..477.02GHz,31250.00kHz,207.9mJy/beam@10km/s,18.5mJy/beam@native, XX YY] U 
[476.53..478.52GHz,31250.00kHz,99.5mJy/beam@10km/s,8.9mJy/beam@native, XX YY] U 
[486.98..488.96GHz,31250.00kHz,507mJy/beam@10km/s,45.7mJy/beam@native, XX YY] U 
[488.47..490.45GHz,31250.00kHz,192.7mJy/beam@10km/s,17.4mJy/beam@native, XX YY]']

It seems the similar, but not duplicate, rows are what is actually in the table being accessed by TAP but that the ALMA web interface to the archive is combining these when presenting the results.

Given that the information that varies among the 4 rows can be reconstructed from the frequency_support information already provided, it might be good for astroquery to return only one row (per unique MOUS?). But that would involve processing the query after its received, and I understand if that's not desirable. But I guess the current state is that one row is being returned per spectral window.

As a further check, I've verified that a query for MOUS uid://A001/X121/X308 returns 9 rows, and the Frequency Support indicates 9 spectral windows.

privong avatar Sep 24 '21 20:09 privong

@keflavich Any thoughts on what to do with this? I'm okay closing it as "not a bug" if y'all want to avoid doing any processing of query results before giving them to the user.

privong avatar Jun 29 '22 17:06 privong

I don't think this is a bug, but it's a totally reasonable feature request / feature to add. We provide some tools for "post-processing" the archive-returned values for other archives (e.g., splatalogue), so we could add such a tool to the utils, for example. However, I'd say it's not a huge priority - just something that would be cool to have.

This issue serves as a useful warning, though, so hopefully users confused by getting duplicate entries will hit here.

keflavich avatar Jun 29 '22 18:06 keflavich