gamma-cat Add data from DESY light curve archive

This is a reminder issue that we should add the data from the DESY light curve archive: https://astro.desy.de/gamma_astronomy/magic/projects/light_curve_archive/index_eng.html

Paper: http://adsabs.harvard.edu/abs/2010A%26A...524A..48T Contact: Elisa Bernardini

They define a "simple lightcurve format". I think we should use something similar, but more consistent with the other formats from the gamma-astro-data-format specs. Format discussions should go here: https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/pull/61#issuecomment-244735482

Sep 05 '16 12:09 cdeil

@wegenmat - Welcome to Github!

Oct 06 '16 14:10 cdeil

thanks. looking forward to our collaboration

Oct 06 '16 14:10 wegenmat

This was partly implemented by @wegenmat in #26.

EDIT: task list removed here, added by @wegenmat in a copy below.

Oct 31 '16 11:10 cdeil

@cdeil Q: Change fluxes in the input files to use proper units, as given in the papers, not use Crab units. A: Might not be possible for all data :( It seems like the original data files got lost after Martin left and part of his account is not available any more. I will talk to Elisa and we'll try to recover them.

Q: Some of the data doesn't seem to have a paper_id (see e.g. here). How should we handle this? A: The VERITAS/Whipple data which has only the web page ref are from the Whipple monitoring web page, not active any more. Actually we should probably contact the VERITAS collaboration before making them public. No clue how to handle this. Maybe give a generic string Whipple_monitoring as paper-id.

Nov 11 '16 12:11 Konstancja

@Konstancja - That's pretty cool that you got that username, and I also like your avatar! :-)

For units:

Do you agree that using proper units for fluxes is better than Crab? (because for Crab fluxes there's many different references in use, so you'd find AGN variability just from making errors just from misunderstandings about which references were used)

If yes, then I'd say let's just do the best we can: where we have fluxes in proper units, we use that. If we have fluxes in Crab units and the dataset is one you'd like to keep, we'll just use Crab units in the input folder, and then convert to proper units in the "as-uniform as possible" output files that we give to users, choosing the Crab reference as well as possible.

There are pros and cons to applying the Crab to proper flux conversion either:

in the scripts that generate the files in input that @wegenmat writes.
in the scripts that generate the files in output from input that I'm writing. As you like.

For data_id, I've split that into a separate issue. Please comment here: #42

Nov 11 '16 14:11 cdeil

@cdeil Yes, I was surprised that the name was not taken. The avatars you can make here: https://www.powerpuffyourself.com/#!/en Note the deadly cosmic rays in the background! ;)

I agree proper units are ALWAYS better than Crab, I am fully aware of possible misinterpretations due to conversion with inaccurate spectra info etc. @wegenmat and I are looking for the lost files and references to recover as many original measurements as possible. For any data set we fail to recover, I am OK with the procedure you describe. We should just make it clear to the user that these are translated from CU and not the original measurements. Also I do not really care in which step you implement this conversion :)

Nov 11 '16 14:11 Konstancja

One more thing: some of the fluxes are ULs, which we converted from actual flux measurements, if flux_error > flux. We did something like UL = flux + 3* flux_error and called them "3 sigma ULs" This makes the backwards conversion a bit more complicated...

Nov 11 '16 14:11 Konstancja

One more thing: some of the fluxes are ULs, which we converted from actual flux measurements, if flux_error > flux. We did something like UL = flux + 3* flux_error and called them "3 sigma ULs" This makes the backwards conversion a bit more complicated...

Do you know which points are ULs? Then you can convert back, no?

Anyways ... I guess just do the best you can for your existing collected data?

We should adopt http://gamma-astro-data-formats.readthedocs.io/en/latest/results/flux_points/index.html#error-columns for how to encode ULs. I guess putting nan ("not a number") for the other columns is safest. OK?

Nov 11 '16 15:11 cdeil

In the DESY archive are marked with flux_error = -1. This also mean that, if we do not have the errors of the original measurements we cannot convert them back :(

Nov 14 '16 10:11 Konstancja

I think we can update the task list now:

[x] Put integer source_id instead of the string with tev-...
[x] Instead of putting input data in input/lightcurves, we should probably put it in sub-folders in papers. I.e. (at least currently) we organise input data by paper, not by data type (like lightcurve here).
[ ] Change fluxes in the input files to use proper units, as given in the papers, not use Crab units.
[ ] Expose lightcurve files in the output or docs/data folder (probably as ECSV and .fits.gz?)
[ ] Link to those files from the webpage (ideally automatically, without having to hand-create a list of lightcurves)
[x] Some of the data doesn't seem to have a paper_id (see e.g. here). How should we handle this?
[ ] Data sets without paper_id must be entered into gamma-cat properly. It is planed to create a gamma-cat internal set of references in input/references, where each record = file has some info like reference_id and url or comment. https://github.com/gammapy/gamma-cat/issues/9#issuecomment-267047595

Dec 14 '16 10:12 wegenmat

@wegenmat - Thanks for updating the task list. I'm changing a lot of things in gamma-cat today (folder, filenames, scripts). Please wait for a day before continuing the work on lightcurves.

Dec 14 '16 13:12 cdeil

I'm going though, changing the sed files to "-sed.ecsv" now, and fixing up LC issues as I see them.

First one is an apparently empty input/data/2001/2001ApJ...546..898A/91_2001ApJ...546..898A.ecsv, which I removed in f78c1e0 .

Dec 14 '16 13:12 cdeil

Sorry, it wasn't empty as I stated. But a duplicate of input/data//2001/2001ApJ...546..898A/tev-000091-lc.ecsv, so it was OK to remove.

Dec 14 '16 13:12 cdeil

One more LC change: in e7a940c I removed input/data/2009/2009ApJ...691L..13D/49_2009ApJ...691L..13D.ecsv. It was an old duplicate of input/data/2009/2009ApJ...691L..13D/tev-000049-lc.ecsv.

Dec 14 '16 13:12 cdeil

@wegenmat - For the task list above, maybe you could add one point that these datasets should be entered into gamma-cat properly:

input/data//no_paper/lightcurves/tev-000049/httpveritas.sao.arizona.edu.ecsv
input/data//no_paper/lightcurves/tev-000049/PhD_Martin_Kestel_MPI_Munich.ecsv
input/data//no_paper/lightcurves/tev-000049/reference_empty_1.ecsv
input/data//no_paper/lightcurves/tev-000049/reference_empty_2.ecsv
input/data//no_paper/lightcurves/tev-000091/httpveritas.sao.arizona.eduSummariessummarymrk501.table_1.ecsv
input/data//no_paper/lightcurves/tev-000091/httpveritas.sao.arizona.eduSummariessummarymrk501.table_2.ecsv
input/data//no_paper/lightcurves/tev-000091/reference_empty.ecsv
input/data//no_paper/lightcurves/tev-000138/httpmagic.mppmu.mpg.depublicationsthesesNTonello.pdf.ecsv
input/data//no_paper/lightcurves/tev-000138/httpveritas.sao.arizona.edu.ecsv
input/data//no_paper/lightcurves/tev-000138/N.Tonello.PrivateCommunication.ecsv
input/data//no_paper/lightcurves/tev-000138/reference_empty.ecsv

My suggestion would be to create a gamma-cat internal set of references in input/references, where each record = file has some info like reference_id and url or comment.

Once that is in place, the LC data would be collected like all the other ones where an ADS reference_id exists, not grouped in a input/data/no_paper folder.

Dec 14 '16 14:12 cdeil

I just now noticed a small issue with the LC data we have. For a few A&A papers the folder name was incorrect (& character dropped, should be encoded %26). Fixed in f16147e

Dec 19 '16 13:12 cdeil

gamma-cat gamma-cat copied to clipboard

Add data from DESY light curve archive

gamma-cat
gamma-cat copied to clipboard