ccdproc icon indicating copy to clipboard operation
ccdproc copied to clipboard

A workaround for unfixable header keys in ImageFileCollection

Open gully opened this issue 5 years ago • 5 comments

Unfixable header keys are currently populating the summary_dict with their genuine values and also missing value place holders, resulting in lists of values that are too long, breaking the ImageFileCollection for data with these malformed headers.

I tried to track down the root cause, and could pinpoint it to the _dict_from_fits_header() method called within _fits_summary. My hunch is there's something special about these unfixable keys that gets them double counted as missing_in_this_file. A key clue is that the resulting list for offending keys is always 2*N-1 long, where N is the number of files.

This workaround is really just that-- a stopgap measure to get this to work on the types of files I have. I suspect the right solution is to dig a bit deeper on _dict_from_fits_header, so I didn't go through with a full check of tests, assuming this workaround won't actually be in the final product.

Here are urls to two example files that reproduce the problem:

https://koa.ipac.caltech.edu/cgi-bin/getKOA/nph-getKOA?instrument=NS&filehand=/koadata11/NIRSPEC/20010117/lev0/spec/NS.20010117.25279.fits

https://koa.ipac.caltech.edu/cgi-bin/getKOA/nph-getKOA?instrument=NS&filehand=/koadata11/NIRSPEC/20010117/lev0/spec/NS.20010117.25430.fits

The malformed keys are GAIN.SPE and FREQ.SPE, though they do not appear in the error message, truncated below:


    627                 pass#summary_dict[key]= [None]*len(summary_dict['file'])
    628 
--> 629         summary_table = Table(summary_dict, masked=True)
    630 
    631         for column in summary_table.colnames:


ValueError: Inconsistent data column lengths: {4, 7}

Please have a look at the following list and replace the "[ ]" with a "[x]" if the answer to this question is yes.

  • [x] For new contributors: Did you add yourself to the "Authors.rst" file?

For documentation changes:

  • [x] For documentation changes: Does your commit message include a "[skip ci]"? Note that it should not if you changed any examples!

For bugfixes:

  • [x] Did you add an entry to the "Changes.rst" file?
  • [ ] Did you add a regression test?
  • [ ] Does the commit message include a "Fixes #issue_number" (replace "issue_number").
  • [ ] Does this PR add, rename, move or remove any existing functions or parameters?

For new functionality:

  • [ ] Did you add an entry to the "Changes.rst" file?
  • [ ] Did you include a meaningful docstring with Parameters, Returns and Examples?
  • [ ] Does the commit message include a "Fixes #issue_number" (replace "issue_number").
  • [ ] Did you include tests for the new functionality?
  • [ ] Does this PR add, rename, move or remove any existing functions or parameters?

Please note that the last point is not a requirement. It is meant as a check if the pull request potentially breaks backwards-compatibility.


gully avatar Aug 11 '20 23:08 gully

I am seeing that this PR breaks 1 test:
py.test -vsk test_filenames_are_set_properly Currently investigating...

gully avatar Nov 17 '20 19:11 gully

Thanks -- can you please ping when you have fixed that? Tests are currently not running on travis because they've dropped support for open source projects.

Also, don't be shy about pinging me multiple times 😬

mwcraig avatar Nov 17 '20 19:11 mwcraig

I fixed the failed test which in hindsight was an edge case of a None header. Also added a warning logger. I'm a bit muddled on this PR. On one hand it's a quick fix that makes it possible to move forward on any data that has malformed FITS keywords, which may be relatively common and hindering other folks from getting started. On the other hand it just treats the symptom not the sickness, and it loses available information (currently just fills the keyword with Nones rather than attempt to sort out the genuine values). In other words the image collection class would come with the proviso: "Currently does not support malformed header keywords".

Probably best practices would be to add a test that has such a malformed keyword and then make sure the correct key:value pairs are returned. Or if remote data can be tolerated in the test suite, then the two files I linked above would reveal the problem.

gully avatar Nov 17 '20 19:11 gully

I'm going to close/reopen to see if that picks up the new GitHub Actions CI. More substantive review tomorrow, I hope...

mwcraig avatar Nov 29 '20 23:11 mwcraig

Codecov Report

Merging #743 (53d4877) into main (5af6ee5) will decrease coverage by 2.26%. The diff coverage is 71.42%.

:exclamation: Current head 53d4877 differs from pull request most recent head d9d7128. Consider uploading reports for the commit d9d7128 to get more accurate results

@@            Coverage Diff             @@
##             main     #743      +/-   ##
==========================================
- Coverage   97.55%   95.30%   -2.26%     
==========================================
  Files           9       30      +21     
  Lines        1392     3895    +2503     
==========================================
+ Hits         1358     3712    +2354     
- Misses         34      183     +149     
Files Changed Coverage Δ
ccdproc/image_collection.py 98.40% <60.00%> (-0.80%) :arrow_down:
ccdproc/log_meta.py 96.25% <100.00%> (ø)

... and 26 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Nov 29 '20 23:11 codecov[bot]