ccdproc
ccdproc copied to clipboard
A workaround for unfixable header keys in ImageFileCollection
Unfixable header keys are currently populating the summary_dict with their genuine values and also missing value place holders, resulting in lists of values that are too long, breaking the ImageFileCollection for data with these malformed headers.
I tried to track down the root cause, and could pinpoint it to the _dict_from_fits_header() method called within _fits_summary. My hunch is there's something special about these unfixable keys that gets them double counted as missing_in_this_file. A key clue is that the resulting list for offending keys is always 2*N-1 long, where N is the number of files.
This workaround is really just that-- a stopgap measure to get this to work on the types of files I have. I suspect the right solution is to dig a bit deeper on _dict_from_fits_header, so I didn't go through with a full check of tests, assuming this workaround won't actually be in the final product.
Here are urls to two example files that reproduce the problem:
https://koa.ipac.caltech.edu/cgi-bin/getKOA/nph-getKOA?instrument=NS&filehand=/koadata11/NIRSPEC/20010117/lev0/spec/NS.20010117.25279.fits
https://koa.ipac.caltech.edu/cgi-bin/getKOA/nph-getKOA?instrument=NS&filehand=/koadata11/NIRSPEC/20010117/lev0/spec/NS.20010117.25430.fits
The malformed keys are GAIN.SPE and FREQ.SPE, though they do not appear in the error message, truncated below:
627 pass#summary_dict[key]= [None]*len(summary_dict['file'])
628
--> 629 summary_table = Table(summary_dict, masked=True)
630
631 for column in summary_table.colnames:
ValueError: Inconsistent data column lengths: {4, 7}
Please have a look at the following list and replace the "[ ]" with a "[x]" if the answer to this question is yes.
- [x] For new contributors: Did you add yourself to the "Authors.rst" file?
For documentation changes:
- [x] For documentation changes: Does your commit message include a "[skip ci]"? Note that it should not if you changed any examples!
For bugfixes:
- [x] Did you add an entry to the "Changes.rst" file?
- [ ] Did you add a regression test?
- [ ] Does the commit message include a "Fixes #issue_number" (replace "issue_number").
- [ ] Does this PR add, rename, move or remove any existing functions or parameters?
For new functionality:
- [ ] Did you add an entry to the "Changes.rst" file?
- [ ] Did you include a meaningful docstring with Parameters, Returns and Examples?
- [ ] Does the commit message include a "Fixes #issue_number" (replace "issue_number").
- [ ] Did you include tests for the new functionality?
- [ ] Does this PR add, rename, move or remove any existing functions or parameters?
Please note that the last point is not a requirement. It is meant as a check if the pull request potentially breaks backwards-compatibility.
I am seeing that this PR breaks 1 test:
py.test -vsk test_filenames_are_set_properly
Currently investigating...
Thanks -- can you please ping when you have fixed that? Tests are currently not running on travis because they've dropped support for open source projects.
Also, don't be shy about pinging me multiple times 😬
I fixed the failed test which in hindsight was an edge case of a None header. Also added a warning logger.
I'm a bit muddled on this PR. On one hand it's a quick fix that makes it possible to move forward on any data that has malformed FITS keywords, which may be relatively common and hindering other folks from getting started. On the other hand it just treats the symptom not the sickness, and it loses available information (currently just fills the keyword with Nones rather than attempt to sort out the genuine values). In other words the image collection class would come with the proviso: "Currently does not support malformed header keywords".
Probably best practices would be to add a test that has such a malformed keyword and then make sure the correct key:value pairs are returned. Or if remote data can be tolerated in the test suite, then the two files I linked above would reveal the problem.
I'm going to close/reopen to see if that picks up the new GitHub Actions CI. More substantive review tomorrow, I hope...
Codecov Report
Merging #743 (53d4877) into main (5af6ee5) will decrease coverage by
2.26%. The diff coverage is71.42%.
:exclamation: Current head 53d4877 differs from pull request most recent head d9d7128. Consider uploading reports for the commit d9d7128 to get more accurate results
@@ Coverage Diff @@
## main #743 +/- ##
==========================================
- Coverage 97.55% 95.30% -2.26%
==========================================
Files 9 30 +21
Lines 1392 3895 +2503
==========================================
+ Hits 1358 3712 +2354
- Misses 34 183 +149
| Files Changed | Coverage Δ | |
|---|---|---|
| ccdproc/image_collection.py | 98.40% <60.00%> (-0.80%) |
:arrow_down: |
| ccdproc/log_meta.py | 96.25% <100.00%> (ø) |
... and 26 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more