astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

heasarc: Adding extra parsing for non-fits standard units

Open bsipocz opened this issue 2 years ago • 4 comments

This is to make the module compatible with the more strict unit parsing in astropy 5.1.

See details and example code in #2333

I'm still running into an ERG related warning at writing out time:

WARNING: UnitsWarning: 'ERG/S/CM^2' did not parse as fits unit: At col 0, Unit 'ERG' not supported by the FITS standard. Did you mean EG, ER, Eg or erg? If this is meant to be a custom unit, define it with 'u.def_unit'. To have it recognized inside a file reader or other code, enable it with 'u.add_enabled_units'. For details, see https://docs.astropy.org/en/latest/units/combining_and_defining.html [astropy.units.core]

And unfortunately, it seems that some of the parsing gone very wrong, too, e.g. S understandably parsed as u.S rather than u.second.

I'm copying the first row here for troubleshooting:

<Row index=0>
         NAME             RA       DEC    COUNT_RATE COUNT_RATE_ERROR    FLUX    FLUX_ERROR HARDNESS_RATIO     ALT_NAME    AP_EXPOSURE    BII     COUNTS COUNTS_ERROR ERROR_RADIUS EXPOSURE HARDNESS_RATIO_NEG_ERR HARDNESS_RATIO_POS_ERR HB_AP_EXPOSURE HB_COUNT_RATE HB_COUNT_RATE_ERROR HB_COUNTS HB_COUNTS_ERROR HB_EXPOSURE  HB_FLUX   HB_FLUX_ERROR  HB_SNR    LII     SB_AP_EXPOSURE SB_COUNT_RATE SB_COUNT_RATE_ERROR SB_COUNTS SB_COUNTS_ERROR SB_EXPOSURE  SB_FLUX   SB_FLUX_ERROR  SB_SNR   SNR   SOURCE_NUMBER                  SEARCH_OFFSET_                
                         deg       deg      ct / S        ct / S      ERG/S/CM^2 ERG/S/CM^2                                     S         deg       ct        ct         arcsec       S                                                         S            ct / S           ct / S           ct           ct            S      ERG/S/CM^2   ERG/S/CM^2             deg           S            ct / S           ct / S           ct           ct            S      ERG/S/CM^2   ERG/S/CM^2                                                                               
       bytes21         float64   float64   float64       float64       float64    float64      float64         bytes15       float64    float64  float64   float64      float64    float64         float64                float64            float64        float64          float64        float64      float64       float64    float64      float64    float64  float64      float64        float64          float64        float64      float64       float64    float64      float64    float64 float64     int64                         bytes47                    
--------------------- ---------- -------- ---------- ---------------- ---------- ---------- -------------- --------------- ----------- --------- ------- ------------ ------------ -------- ---------------------- ---------------------- -------------- ------------- ------------------- --------- --------------- ----------- ---------- ------------- ------- ---------- -------------- ------------- ------------------- --------- --------------- ----------- ---------- ------------- ------- ------- ------------- -----------------------------------------------
CXOC J100036.9+022614 150.154020 2.437240   3.52e-04         4.96e-05   4.74e-15   6.69e-16           0.00 [ECV2009] 989             0 42.284067     0.0          0.0         1.09   162070                   0.00                   0.00              0      0.00e+00            2.72e-04       0.0             0.0           0   0.00e+00      7.13e-15    0.00 236.594835              0      3.28e-04            4.45e-05       0.0             0.0      161300   1.79e-15      2.43e-16    7.37    7.09           989 16.648 (150.01000622572633,2.1999950476067367)

I'm cc-ing @mhvk from the units side of the problem and @taldcroft as Tables and Chandra expert to see whether there are better approaches than add_enabled_aliases for this problem. I'm certain similar issues may come up in other modules, after all hardly any data providers follow the fits standards.

bsipocz avatar Mar 29 '22 10:03 bsipocz

add_enabled_aliases was added in astropy 4.3, so we'll need a different approach for older version, as I don't think it's a serious enough reason to drop support just yet.

bsipocz avatar Mar 30 '22 01:03 bsipocz

@bsipocz - I don't know of any better approaches. I can only apologize for the original catalog not being FITS compliant since I was involved in that project.

I am surprised that HEASARC does not validate input tables at the point of ingest since that would be the right place to make sure their archive is clean, but clearly that is water under the bridge at this point.

taldcroft avatar Mar 30 '22 11:03 taldcroft

Hmm, another approach is to ditch the fits file format and switch to votable. The units come up properly in that case, so after all this might be a heasarc API issue during the process of converting the DB response to the desired output format, rather than an issue with the original data holding and ingestion.

That would not address the desire in #2333 to write out the table in fits format. 1) we either fix all the issues to support that or 2) advocate for astronomers to drop the usage of fits for catalogues in favour of votable.

Option 2 would certainly make life much much easier 😅

(Issues with saving as fits are:

  • name columns are parsed as object dtype, so we need to convert them to str manually
  • description is too long, so we both get a complaint about the usage of HIERARCH cards, but then a ValueError, too for it's being too long
  • multiple warnings about e.g. multiple slashes in units
  • non-parsing units

cc @volodymyrss and @zoghbi-a to see whether they have any recommendations. Long term I suppose it would be nice to switch to a VO backend, but for now, a workaround PR like the one proposed would fix the annoyances that came yup in the fornax use case.

bsipocz avatar Mar 30 '22 22:03 bsipocz

I think also switch to VOTable of heasarc is also not too complex: https://github.com/astropy/astroquery/pull/2353 The problem is only that fields get renamed, mostly capitalization changes, but not only. This can break things for the users. But maybe it's acceptable at some point.

volodymyrss avatar Mar 31 '22 08:03 volodymyrss

This PR was a deadend and as the module is now being refactored, I'm closing it without merging. The issue for the bug remains open and we should check if it's still present when the refactoring is done.

bsipocz avatar May 07 '24 06:05 bsipocz