gamma-astro-data-formats icon indicating copy to clipboard operation
gamma-astro-data-formats copied to clipboard

Defining a common XML format for models in Fermi/LAT Science Tools and GammaLib

Open jknodlseder opened this issue 8 years ago • 6 comments

The GammaLib software implements so far the XML format that has been defined for the Fermi/LAT Science Tools (ST) for model definition. GammaLib has several additional spatial model components, and the XML format has been extended to cover also these cases. More spatial models are now also added to the Fermi/LAT Science Tools, hence to keep compatibility between GammaLib and Fermi/LAT Science Tools it would be important to agree now on a common naming convention. This could also be the moment to change some of the initial naming conventions so that model components have a more coherent set of names. It is proposed that the Fermi/LAT Science Tools and GammaLib softwares will implement some proxies to ease the transition from the old to the new format, but ultimately, only the new format should be used in the future for model definition.

The discussion on this has started on Fermi/LAT confluence and in an e-mail thread, here just the summary of what is proposed so far.

The proposal is to change the type attributes of the spatialModel elements in the XML file (see an example XML file below). The following types are proposed:

  • PointSource as replacement of SkyDirFunction in ST and GammaLib
  • RadialDisk as replacement for SpatialDisk in ST and DiskFunction in GammaLib
  • RadialGaussian as replacement for SpatialGaussian in ST and GaussFunction in GammaLib
  • RadialShell as replacement for ShellFunction in GammaLib
  • EllipticalDisk stays as is in GammaLib
  • EllipticalGaussian as replacement for EllipticalGauss in GammaLib
  • EllipticalShell if needed in the future
  • DiffuseIsotropic as replacement for ConstantValue in ST and GammaLib
  • DiffuseMap as replacement for SpatialMap in ST and GammaLib
  • DiffuseMapCube as replacement for MapCubeFunction in ST and GammaLib

In addition, the type attributes of the source elements in the XML file are proposed to be defined as follows:

  • PointSource for point sources (as is)
  • ExtendedSource for all radial and elliptical models (is DiffuseSource in ST)
  • DiffuseSource for all diffuse models (as is)

Below for illustration of the type attributes and the source and spatialModel elements an XML of the current XML file format:

  <source name="Crab" type="PointSource">
    <spectrum type="PowerLaw">
       <parameter name="Prefactor" scale="1e-16" value="5.7"  min="1e-07" max="1000.0" free="1"/>
       <parameter name="Index"     scale="-1"    value="2.48" min="0.0"   max="+5.0"   free="1"/>
       <parameter name="Scale"     scale="1e6"   value="0.3"  min="0.01"  max="1000.0" free="0"/>
    </spectrum>
    <spatialModel type="SkyDirFunction">
      <parameter name="RA"  scale="1.0" value="83.6331" min="-360" max="360" free="0"/>
      <parameter name="DEC" scale="1.0" value="22.0145" min="-90"  max="90"  free="0"/>
    </spatialModel>
  </source>

jknodlseder avatar Apr 26 '16 19:04 jknodlseder

@jknodlseder - Thank you for taking the initiative on this!

This doesn't just affect Gammalib and the Fermi ST as mentioned in the title and description, but also Gammapy and pointlike and 3ML and maybe Naima and Gamera as astro modeling codes and other science tool codes that have a use for model specification or serialisation.

So @woodmd @zblz @joleroi @adonath @registerrier @giacomov @tburnett -- please comment here!

@taldcroft - With the Astropy / Sherpa bridge, is there interest on a shared model specification / serialisation spec like this one or are there even plans to develop a more general one?


Here's some first thoughts from me:

  • Personally I find YAML more easy to read and write than XML and would prefer it. But that's not the most important decision and I think (but am not sure) both YAML and XML are equally powerful, so this is mostly a matter of taste.
  • This format might be the sweet spot ... it covers a lot of models people want to specify, and it's still a reasonably simple, human-editable format.
  • But it's clearly limited and non-perfect. I think e.g. a multiplicative absorption line can't be specified in this format, right? Neither can linked parameters. And the diffuse models require external FITS files, i.e. some model specifications require multiple files, it would be nicer to be able to have any model specified in one file.
  • To me it's not clear yet if this format is too limiting for important use cases, and about how extensible is when it comes to storing extra information like fit results (being able to store fitted models is a subset of the info one wants to store when serialising model fit results, the TS values and covariance matrix being other info).

My guess is that we'll end up adopting almost exactly what is proposed now, because it's a small refinement / cleanup of something that's already proven and widely used.

Does anyone think we should try to extend the format in a significant way (e.g. multiplicative models or linked parameters)?

Or should we just put this in basically as proposed now?

cdeil avatar Apr 28 '16 22:04 cdeil

While I agree that it would be good to adhere to the same XML format between the Fermi ST and other software, in the long run I'd rather move to something more user-friendly and up to date than XML.

As far as XML is concerned, the proposed extension looks good, but I don't think it is possible at this stage to change anything which would break backward compatibility with the already existing XML files for the Fermi ST.

For the future, my preference would be for YAML or JSON, which are the formats we were already talking about here:

https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/issues/1

For 3ML/astromodels I adopted YAML (but moving to JSON would be immediate. Everything is a dictionary so it just a matter of changing the library to call to serialize and de-serialize).

The reason I am implementing a new format is that I need something that goes well beyond what is currently possible with the XML defined for the Fermi ST: links among parameters, physical units, time-varying parameters and so on.

An example of the YAML format I'm using (which is still in development, and it is inspired by what came out from the workshop in Heidelberg) can be found here:

https://gist.github.com/giacomov/4484a43e290520cf5020e64c91eec9ab

A more elaborated example, using a Naima model with parameters depending on time, can be found here:

https://gist.github.com/giacomov/a068c4521cb72d922a6919ab87bb118f

Everything is of "pure types" (int, float, strings) so the same file can be read by a software which has a different class structure than astromodels. Also, the format is very much inspired by the python mantra "explicit is better than implicit", so that it should be easy to interpret just by reading it.

I know that there are other discussions on these things, so please tell me if I should post these details in other places instead of this one. I confess that I am a bit lost with all these repositories...

giacomov avatar Apr 28 '16 23:04 giacomov

For complementary information:

GammaLib uses in fact an extension of the Fermi/LAT format that is covering some of the use cases you just described @giacomov. For time variable models, a third block is added to a source block in the XML format

<temporalModel type="Constant">
  <parameter name="Normalization" scale="1.0" value="1.0" min="0.1" max="10.0" free="0"/>
</temporalModel>

(see http://cta.irap.omp.eu/gammalib/user_manual/modules/model.html#sky-models for a complete description of the model types). I have not checked though whether the Fermi/LAT Science Tools simply ignore these extensions, or whether they stop with an error. If the temporal extension is not given in GammaLib, a constant model will be assumed by the software.

Another functionality is the addition of instrument and id attributes that allow linking of model components to specific observations, see http://cta.irap.omp.eu/ctools/user_manual/getting_started/beyond_combine.html, http://cta.irap.omp.eu/ctools/user_manual/getting_started/beyond_irf.html and http://cta.irap.omp.eu/ctools/user_manual/getting_started/beyond_model.html.

Linking of parameters is foreseen in a similar logic, but not yet implemented. I also had though a couple of years ago about how to implement parameter functions in the XML format (see https://cta-redmine.irap.omp.eu/projects/gammalib/wiki/GModel) but have this not pushed forward either.

jknodlseder avatar Apr 29 '16 05:04 jknodlseder

Just some quick comments related to the discussion so far:

  • Concerning Sherpa and astropy modeling, this model definition language would need to support serialization of arbitrary model expressions like a * (b + c * (d - a)) with linked parameters. This could be done by defining a list of model components and then separately a model expression, or it can be done like Sherpa does internally by using a nested tree structure with binary operator model components.
  • Astropy.modeling also supports model composition and concatenation (http://astropy.readthedocs.io/en/latest/modeling/compound-models.html#operators)
  • I don't think of XML as being hand-editable or human readable. It's fine for computers, lousy for humans in real-use situations. Thus if the existing XML format would be constraining for a broader model definition format then I would vote for YAML. Note one important distinction between YAML and JSON is that JSON does not support integer key values, they end up as a string. In terms of serializing arbitrary data structures this can be an annoyance.
  • The examples by @giacomov include key values like source_1 (point source):. I would recommend against encapsulating multiple pieces of information (source identifier and spatial model) into a string expression that needs to be parsed. Just use additional keys within the source structure.

taldcroft avatar Apr 29 '16 12:04 taldcroft

OK, so there's concerns about this XML model spec format and ideas to make better model spec formats. Personally I agree that YAML would be nicer both as a serialisation and end-user format and I would have more comments e.g. on model parametrisation before this goes in. But others will have gripes with YAML and whatever scheme someone comes up with ("too complex", "doesn't support my use case", "doesn't use the model parametrisation I want").

How should we proceed?

A) Continue discussing and prototyping for another year? B) Allow this XML model spec to go in with some review, with a statement at the top that it's only one proposed solution and it's not agreed on yet by the community as the one way true model spec format.

I prefer B, and encourage others to propose alternative specs. Sure, multiple specs for roughly the same thing are bad. But progress requires someone writing down the proposals in detail and then an analysis of pros and cons, and I don't think A will yield good progress.

@jknodlseder - I have some questions for you:

  • Do you think B would be useful? Or do you think allowing alternative model spec proposals is a bad idea and we shouldn't put work in progress things in the spec at all?
  • Do you see this XML format mostly as a serialisation format and add some more friendly second model spec format to ctools? Or do you think it's an appropriate format for users to read and write and want to continue evolving this to cover more complex models (or even links to IRFs)?

cdeil avatar May 10 '16 21:05 cdeil

There's a thread on this on the ctools mailing list today.

@jknodlseder Is it possible to make the ctools mailing list archives public so that we can link to this?

cdeil avatar Jul 22 '16 07:07 cdeil