gamma-cat
gamma-cat copied to clipboard
Add data from DESY light curve archive
This is a reminder issue that we should add the data from the DESY light curve archive: https://astro.desy.de/gamma_astronomy/magic/projects/light_curve_archive/index_eng.html
Paper: http://adsabs.harvard.edu/abs/2010A%26A...524A..48T Contact: Elisa Bernardini
They define a "simple lightcurve format".
I think we should use something similar, but more consistent with the other formats from the gamma-astro-data-format
specs.
Format discussions should go here: https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/pull/61#issuecomment-244735482
@wegenmat - Welcome to Github!
thanks. looking forward to our collaboration
This was partly implemented by @wegenmat in #26.
EDIT: task list removed here, added by @wegenmat in a copy below.
@cdeil Q: Change fluxes in the input files to use proper units, as given in the papers, not use Crab units. A: Might not be possible for all data :( It seems like the original data files got lost after Martin left and part of his account is not available any more. I will talk to Elisa and we'll try to recover them.
Q: Some of the data doesn't seem to have a paper_id
(see e.g. here). How should we handle this?
A: The VERITAS/Whipple data which has only the web page ref are from the Whipple monitoring web page, not active any more. Actually we should probably contact the VERITAS collaboration before making them public. No clue how to handle this. Maybe give a generic string Whipple_monitoring as paper-id
.
@Konstancja - That's pretty cool that you got that username, and I also like your avatar! :-)
For units:
Do you agree that using proper units for fluxes is better than Crab? (because for Crab fluxes there's many different references in use, so you'd find AGN variability just from making errors just from misunderstandings about which references were used)
If yes, then I'd say let's just do the best we can: where we have fluxes in proper units, we use that. If we have fluxes in Crab units and the dataset is one you'd like to keep, we'll just use Crab units in the input folder, and then convert to proper units in the "as-uniform as possible" output files that we give to users, choosing the Crab reference as well as possible.
There are pros and cons to applying the Crab to proper flux conversion either:
- in the scripts that generate the files in
input
that @wegenmat writes. - in the scripts that generate the files in
output
frominput
that I'm writing. As you like.
For data_id
, I've split that into a separate issue. Please comment here: #42
@cdeil Yes, I was surprised that the name was not taken. The avatars you can make here: https://www.powerpuffyourself.com/#!/en Note the deadly cosmic rays in the background! ;)
I agree proper units are ALWAYS better than Crab, I am fully aware of possible misinterpretations due to conversion with inaccurate spectra info etc. @wegenmat and I are looking for the lost files and references to recover as many original measurements as possible. For any data set we fail to recover, I am OK with the procedure you describe. We should just make it clear to the user that these are translated from CU and not the original measurements. Also I do not really care in which step you implement this conversion :)
One more thing: some of the fluxes are ULs, which we converted from actual flux measurements, if flux_error > flux. We did something like UL = flux + 3* flux_error and called them "3 sigma ULs" This makes the backwards conversion a bit more complicated...
One more thing: some of the fluxes are ULs, which we converted from actual flux measurements, if flux_error > flux. We did something like UL = flux + 3* flux_error and called them "3 sigma ULs" This makes the backwards conversion a bit more complicated...
Do you know which points are ULs? Then you can convert back, no?
Anyways ... I guess just do the best you can for your existing collected data?
We should adopt http://gamma-astro-data-formats.readthedocs.io/en/latest/results/flux_points/index.html#error-columns for how to encode ULs. I guess putting nan
("not a number") for the other columns is safest.
OK?
In the DESY archive are marked with flux_error = -1
.
This also mean that, if we do not have the errors of the original measurements we cannot convert them back :(
I think we can update the task list now:
- [x] Put integer source_id instead of the string with tev-...
- [x] Instead of putting input data in input/lightcurves, we should probably put it in sub-folders in papers. I.e. (at least currently) we organise input data by paper, not by data type (like lightcurve here).
- [ ] Change fluxes in the input files to use proper units, as given in the papers, not use Crab units.
- [ ] Expose lightcurve files in the output or docs/data folder (probably as ECSV and .fits.gz?)
- [ ] Link to those files from the webpage (ideally automatically, without having to hand-create a list of lightcurves)
- [x] Some of the data doesn't seem to have a paper_id (see e.g. here). How should we handle this?
- [ ] Data sets without paper_id must be entered into gamma-cat properly. It is planed to create a gamma-cat internal set of references in input/references, where each record = file has some info like
reference_id
andurl
orcomment
. https://github.com/gammapy/gamma-cat/issues/9#issuecomment-267047595
@wegenmat - Thanks for updating the task list. I'm changing a lot of things in gamma-cat today (folder, filenames, scripts). Please wait for a day before continuing the work on lightcurves.
I'm going though, changing the sed files to "-sed.ecsv" now, and fixing up LC issues as I see them.
First one is an apparently empty input/data/2001/2001ApJ...546..898A/91_2001ApJ...546..898A.ecsv
, which I removed in f78c1e0 .
Sorry, it wasn't empty as I stated. But a duplicate of input/data//2001/2001ApJ...546..898A/tev-000091-lc.ecsv
, so it was OK to remove.
One more LC change: in e7a940c I removed input/data/2009/2009ApJ...691L..13D/49_2009ApJ...691L..13D.ecsv
. It was an old duplicate of input/data/2009/2009ApJ...691L..13D/tev-000049-lc.ecsv
.
@wegenmat - For the task list above, maybe you could add one point that these datasets should be entered into gamma-cat properly:
input/data//no_paper/lightcurves/tev-000049/httpveritas.sao.arizona.edu.ecsv
input/data//no_paper/lightcurves/tev-000049/PhD_Martin_Kestel_MPI_Munich.ecsv
input/data//no_paper/lightcurves/tev-000049/reference_empty_1.ecsv
input/data//no_paper/lightcurves/tev-000049/reference_empty_2.ecsv
input/data//no_paper/lightcurves/tev-000091/httpveritas.sao.arizona.eduSummariessummarymrk501.table_1.ecsv
input/data//no_paper/lightcurves/tev-000091/httpveritas.sao.arizona.eduSummariessummarymrk501.table_2.ecsv
input/data//no_paper/lightcurves/tev-000091/reference_empty.ecsv
input/data//no_paper/lightcurves/tev-000138/httpmagic.mppmu.mpg.depublicationsthesesNTonello.pdf.ecsv
input/data//no_paper/lightcurves/tev-000138/httpveritas.sao.arizona.edu.ecsv
input/data//no_paper/lightcurves/tev-000138/N.Tonello.PrivateCommunication.ecsv
input/data//no_paper/lightcurves/tev-000138/reference_empty.ecsv
My suggestion would be to create a gamma-cat internal set of references in input/references
, where each record = file has some info like reference_id
and url
or comment
.
Once that is in place, the LC data would be collected like all the other ones where an ADS reference_id exists, not grouped in a input/data/no_paper
folder.
I just now noticed a small issue with the LC data we have.
For a few A&A papers the folder name was incorrect (&
character dropped, should be encoded %26
).
Fixed in f16147e