drizzlepac icon indicating copy to clipboard operation
drizzlepac copied to clipboard

temporary ecsv file not removed and was ingested

Open stscijgbot-hstdp opened this issue 1 year ago • 4 comments

Issue HLA-1174 was created on JIRA by Lisa Sherbert:

While testing HSTSDP-2022 to make sure catalog files were not produced, I found an unexpected one and found it was also ingested.

cmd is: sqsh -S GROUCHO -D dadstest2 -e -i hla.sql -P z -U
[0] GROUCHO.dadstest2.1> SELECT CONVERT(VARCHAR(55), afi_file_name) afi_file_name,
[0] GROUCHO.dadstest2.2>        afi_archive_class,
[0] GROUCHO.dadstest2.3>        afi_generation_date
[0] GROUCHO.dadstest2.4>  FROM dbo.archive_files
[0] GROUCHO.dadstest2.5>  WHERE afi_generation_date > '2023-12-08' and afi_file_name like '%ecsv'
 afi_file_name                                           afi_archive_class afi_generation_date
 ------------------------------------------------------- ----------------- --------------------------
 hst_5397_17_wfpc2_pc_f555w_u27817_point-cat-fxm.ecsv    HFS               Dec  8 2023 09:50:41:000PM

(1 row affected)

hst_5397_17_wfpc2_pc_f555w_u27817_point-cat-fxm.ecsv is a temporary file which we really do not want to ingest. If it not produced the next time the dataset is ingested, it is not removed from the on-line cache. So we would really like it to not be there when we ingest.

On tldmscsched2: In nigel_u27817_1702071124.524341/ALOG_1702071626_WFPC2_SingleVisitMosaic_u27817.out, I see:

2023342214027 INFO src=wfpc2_svm.set_env_vars msg="os.environ['SVM_CATALOG_PC'] = 'OFF'"
...
2023342214609 WARNING src=drizzlepac.haputils.svm_quality_analysis- [compare_ra_dec_crossmatches] Catalog hst_5397_17_wfpc2_pc_f555w_u27817_point-cat.ecsv Missing!  No comparison can be made.
...
2023342214609 WARNING src=drizzlepac.haputils.svm_quality_analysis- Catalog hst_5397_17_wfpc2_pc_f555w_u27817_point-cat.ecsv does not exist.  Both the Point and Segment catalogs must exist for comparison.
...
2023342214645 INFO src=drizzlepac.haputils.svm_quality_analysis- Crossmatch reference image hst_5397_17_wfpc2_pc_f555w_u27817_drz.fits contains 1 sources.
2023342214645 INFO src=drizzlepac.haputils.svm_quality_analysis-
2023342214645 INFO src=drizzlepac.haputils.svm_quality_analysis- Wrote temporary source catalog hst_5397_17_wfpc2_pc_f555w_u27817_point-cat-fxm.ecsv
2023342214645 WARNING src=drizzlepac.haputils.svm_quality_analysis- HAP Point sourcelist interfilter comparison (compare_interfilter_crossmatches) encountered a problem.
2023342214645 ERROR src=drizzlepac.haputils.svm_quality_analysis- message
Traceback (most recent call last):
  File "/hsttst/project/pipeline/pkgs/miniconda3/envs/caldp_satandtools/lib/python3.9/site-packages/drizzlepac/haputils/svm_quality_analysis.py", line 1920, in run_quality_analysis
    compare_interfilter_crossmatches(total_obj_list, json_timestamp=json_timestamp,
  File "/hsttst/project/pipeline/pkgs/miniconda3/envs/caldp_satandtools/lib/python3.9/site-packages/drizzlepac/haputils/svm_quality_analysis.py", line 668, in compare_interfilter_crossmatches
    filtobj_dict[imgname] = transform_coords(filtobj_dict[imgname],
  File "/hsttst/project/pipeline/pkgs/miniconda3/envs/caldp_satandtools/lib/python3.9/site-packages/drizzlepac/haputils/svm_quality_analysis.py", line 925, in transform_coords
    xy_centroid_values = np.stack((filtobj_subdict['sources']['xcentroid'],
  File "/hsttst/project/pipeline/pkgs/miniconda3/envs/caldp_satandtools/lib/python3.9/site-packages/astropy/table/table.py", line 2055, in __getitem__
    return self.columns[item]
  File "/hsttst/project/pipeline/pkgs/miniconda3/envs/caldp_satandtools/lib/python3.9/site-packages/astropy/table/table.py", line 264, in __getitem__
    return OrderedDict.__getitem__(self, item)

I’m assuming the temp file stayed around due to that Traceback?

stscijgbot-hstdp avatar Dec 08 '23 22:12 stscijgbot-hstdp

Comment by Lisa Sherbert on JIRA:

Would this error only happen if metrics collection is turned on? If so, then it should not happen in Ops because metrics not collected there.

COLLECT_INS_METRICS turn it off and test it on Test

stscijgbot-hstdp avatar Dec 11 '23 15:12 stscijgbot-hstdp

Comment by Lisa Sherbert on JIRA:

I was able to verify that u27817 did NOT produce the temporary ecsv file when COLLECT_INS_METRICS is set to false, which is Good. This issue should NOT happen in Operations.

Test is supposed to be Ops-like but in this case it is not. We collect the metrics but have not been able to do anything with them lately.

stscijgbot-hstdp avatar Dec 12 '23 14:12 stscijgbot-hstdp

Comment by Steve Goldman on JIRA:

Hey Lisa Sherbert, Any update on this ticket?

stscijgbot-hstdp avatar Dec 18 '23 14:12 stscijgbot-hstdp

Comment by Lisa Sherbert on JIRA:

It may not even need to be worked? It may be you want to keep that file around if that kind of error occurs? Likely needs to be discussed with Michele.

I was mainly concerned that we were ingesting it but that will NOT be an issue in Ops. Test is collecting metrics (why we still do is a question since we are not doing anything with them and cannot because tools we were going to use no longer work) but Ops will NOT collect metrics.

It is a difficult problem to weed out files that should not be ingested and still allow calibration to create new products to be ingested. I thought at some point we were using the manifest file to know what to ingest, but that does not seem to be the case? At least not with WFPC2.

stscijgbot-hstdp avatar Dec 18 '23 15:12 stscijgbot-hstdp