gamma-astro-data-formats
gamma-astro-data-formats copied to clipboard
Change OBS_ID keyword from integer to string
The event list extension has an OBS_ID keyword that is currently defined as integer. This comes from the original event list format specification from Karl. It appears however that OGIP has in the meantime switched to a string, see for example https://heasarc.gsfc.nasa.gov/docs/heasarc/ofwg/docs/general/ogip_94_001/ogip_94_001.html which says:
OBSERVATION ID: Sequence numbers are commonly used by missions to identify 'uniquely' a particular dataset. The definition of the sequence number is not unique across mission and typically depend upon the different way a dataset is identified within that mission. Although sequence number suggests a numerical value only, past mission had given the sequence number as a mixture of numerical and character values. To avoid a proliferation of keywords to store the sequence number value, the OGIP recommend to use OBS_ID. The keywords value is a string, to allow back compatability with the already archived mission. The OBS_ID keyword value can be used by database software.
So I would propose to change the OBS_ID keyword from integer to string.
@jknodlseder - A few questions:
- What are current high-energy missions (Fermi, Chandra, XMM, ...) using?
Integer or string for
OBS_ID
? - In tools like Gammalib and Gammapy, would we have to support both integer and string for
OBS_ID
for a while or forever? - For FITS header keywords, there's no difference, right? If the value can be parsed as an int, tools like
astropy.io.fits
return it as an int. So we're only talking about table column dtypes for EVENTS and OBS_INDEX or other tables that haveOBS_ID
as a column?
Just to give the counter-argument why we might want to keep int
:
- This is a breaking change. Current files use
int
and science tools only supportint
: https://github.com/gammapy/gammapy/search?utf8=%E2%9C%93&q=OBS_ID https://github.com/gammalib/gammalib/search?utf8=%E2%9C%93&q=OBS_ID - The quote you cite says that int would be nicer, and this is only allowed as string because older missions had string.
- One use case I could think of for int
OBS_ID
that I've used myself is to merge events from multiple observations (all for HESS 1) into one event list and then join that with event lists from other recos and cuts to study event reconstruction differences and cut differences. Of course it can be done differently.
My impression is that int is simpler and more convenient and is in place, so I don't see the advantage of doing the change.
@jknodlseder - if you're available I'll put this on the agenda for next Tuesday's IACT DL3 telcon: https://github.com/open-gamma-ray-astro/2016-04_IACT_DL3_Meeting/blob/master/notes/2016-09-06-IACT_DL3_Telcon.md
@jknodlseder - Do you still think CTA / IACT DL3 data should change OBS_ID to be a string instead of an int?
It would be quite a big breaking change, at least in Gammapy we work with OBS_ID
as integer in several places, and presumably in ctools you do as well?
@kosack - Is there a plan for this in CTA already? Should discussions continue here on the DL3 spec, or is there a different spec / process now in CTA?