gamma-astro-data-formats icon indicating copy to clipboard operation
gamma-astro-data-formats copied to clipboard

Change OBS_ID keyword from integer to string

Open jknodlseder opened this issue 8 years ago • 2 comments

The event list extension has an OBS_ID keyword that is currently defined as integer. This comes from the original event list format specification from Karl. It appears however that OGIP has in the meantime switched to a string, see for example https://heasarc.gsfc.nasa.gov/docs/heasarc/ofwg/docs/general/ogip_94_001/ogip_94_001.html which says:

OBSERVATION ID: Sequence numbers are commonly used by missions to identify 'uniquely' a particular dataset. The definition of the sequence number is not unique across mission and typically depend upon the different way a dataset is identified within that mission. Although sequence number suggests a numerical value only, past mission had given the sequence number as a mixture of numerical and character values. To avoid a proliferation of keywords to store the sequence number value, the OGIP recommend to use OBS_ID. The keywords value is a string, to allow back compatability with the already archived mission. The OBS_ID keyword value can be used by database software.

So I would propose to change the OBS_ID keyword from integer to string.

jknodlseder avatar Aug 04 '16 20:08 jknodlseder

@jknodlseder - A few questions:

  • What are current high-energy missions (Fermi, Chandra, XMM, ...) using? Integer or string for OBS_ID?
  • In tools like Gammalib and Gammapy, would we have to support both integer and string for OBS_ID for a while or forever?
  • For FITS header keywords, there's no difference, right? If the value can be parsed as an int, tools like astropy.io.fits return it as an int. So we're only talking about table column dtypes for EVENTS and OBS_INDEX or other tables that have OBS_ID as a column?

Just to give the counter-argument why we might want to keep int:

  • This is a breaking change. Current files use int and science tools only support int: https://github.com/gammapy/gammapy/search?utf8=%E2%9C%93&q=OBS_ID https://github.com/gammalib/gammalib/search?utf8=%E2%9C%93&q=OBS_ID
  • The quote you cite says that int would be nicer, and this is only allowed as string because older missions had string.
  • One use case I could think of for int OBS_ID that I've used myself is to merge events from multiple observations (all for HESS 1) into one event list and then join that with event lists from other recos and cuts to study event reconstruction differences and cut differences. Of course it can be done differently.

My impression is that int is simpler and more convenient and is in place, so I don't see the advantage of doing the change.

@jknodlseder - if you're available I'll put this on the agenda for next Tuesday's IACT DL3 telcon: https://github.com/open-gamma-ray-astro/2016-04_IACT_DL3_Meeting/blob/master/notes/2016-09-06-IACT_DL3_Telcon.md

cdeil avatar Sep 02 '16 14:09 cdeil

@jknodlseder - Do you still think CTA / IACT DL3 data should change OBS_ID to be a string instead of an int?

It would be quite a big breaking change, at least in Gammapy we work with OBS_ID as integer in several places, and presumably in ctools you do as well?

@kosack - Is there a plan for this in CTA already? Should discussions continue here on the DL3 spec, or is there a different spec / process now in CTA?

cdeil avatar Aug 26 '18 17:08 cdeil