jwst icon indicating copy to clipboard operation
jwst copied to clipboard

outlier_i2d.fits file not found error

Open stscijgbot-jp opened this issue 2 years ago • 4 comments

Issue JP-3085 was created on JIRA by Hien Tran:

jw01539-c1010_20230203t072948_image3_00002 failed with error 

2023-02-03 14:23:39,099 - stpipe.Image3Pipeline.outlier_detection - INFO - Exposure jw01947004001_02101_00003_mirimage_outlier_i2d.fits saved to file
2023-02-03 14:23:39,109 - stpipe.Image3Pipeline.outlier_detection - INFO - 1 exposures to drizzle together
2023-02-03 14:23:40,266 - stpipe.Image3Pipeline.outlier_detection - INFO - Drizzling (1024, 1032) --> (1028, 1032)
2023-02-03 14:23:40,692 - stpipe.Image3Pipeline.outlier_detection - INFO - Exposure jw01947004001_02101_00001_mirimage_outlier_i2d.fits saved to file
2023-02-03 14:23:40,701 - stpipe.Image3Pipeline.outlier_detection - INFO - 1 exposures to drizzle together

2023-02-03 14:23:45,466 - stpipe.Image3Pipeline.outlier_detection - WARNING - /dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.8.2.20221020-py3.9/lib/python3.9/site-packages/yaml/constructor.py:49: ResourceWarning: unclosed file <_io.BufferedReader name=‘jw01947004001_02101_00003_mirimage_outlier_i2d.fits’>
node = self.get_single_node()FileNotFoundError: [Errno 2] No such file or directory: ‘jw01947004001_02101_00001_mirimage_outlier_i2d.fits’
----------------------------------------------------------------------
ERROR RUNNING STEP ‘Image3Pipeline’:
[Errno 2] No such file or directory:
‘jw01947004001_02101_00001_mirimage_outlier_i2d.fits’
----------------------------------------------------------------------

the files does exist on disk, but with a time stamp of 14:23:50 UT, 5 sec after the failure above. A similar thing happened in another log about 6 hours earlier: 2023-02-03 08:32:58,371 - stpipe.Image3Pipeline.outlier_detection - INFO - Exposure jw01539075001_02104_00002_nrcb1_outlier_i2d.fits saved to file FileNotFoundError: [Errno 2] No such file or directory: ‘jw01539075001_02104_00002_nrcb1_outlier_i2d.fits’

ERROR RUNNING STEP ‘Image3Pipeline’: [Errno 2] No such file or directory: ‘jw01539075001_02104_00002_nrcb1_outlier_i2d.fits’

 Apparently the file was written, but could not be available for reading shortly after.

from the jwst-dms-isilon-issue-wg slack thread discussion:

walker  12 hours ago

The sync mount optionThe NFS client treats the sync mount option differently than some other file systems (refer to mount(8) for a description of the generic sync and async mount options). If neither sync nor async is specified (or if the async option is specified), the NFS client delays sending application writes to the server until any of these events occur: Memory pressure forces reclamation of system memory resources. An application flushes file data explicitly with sync(2), msync(2), or fsync(3).An application closes a file with close(2).The file is locked/unlocked via fcntl(2).In other words, under normal circumstances, data written by an application may not immediately appear on the server that hosts the file. If the sync option is specified on a mount point, any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to user space. This provides greater data cache coherence among clients, but at a significant performance cost.Applications can use the O_SYNC open flag to force application writes to individual files to go to the server immediately without the use of the sync mount option. walker  12 hours ago We do not specify sync or async, so it is async. walker  12 hours ago

You can use O_SYNC in the code to make it work I guess. mswam  12 hours ago

@bushouse Howard can you check which file-write options are used in the CAL code when saving *_outlier.fits files?   Tom suggests that we might need to adjust them, to have a better chance of successful read after write/save.

stscijgbot-jp avatar Jun 28 '23 18:06 stscijgbot-jp

Comment by Jesse Doggett on JIRA:

I think that immediately after any files are written that need to be read back in, the cal code should NOT proceed until it can verify that the file is available for reading.

stscijgbot-jp avatar Jun 28 '23 18:06 stscijgbot-jp

Comment by Mike Swam on JIRA:

We have to do a similar read-retry operation in several SDP locations, because the isilon performance does not always support quick read of a file just closed.

 

stscijgbot-jp avatar Jun 28 '23 18:06 stscijgbot-jp

Comment by Hien Tran on JIRA:

instead of verifying that the file is available, perhaps it's simpler to just add in a sleep time (say, ~1min) in the cal code before moving on.

stscijgbot-jp avatar Jun 28 '23 18:06 stscijgbot-jp

Comment by Maria Pena-Guerrero on JIRA:

We decided to put this ticket back in the backlog while we determine if it is still an issue after we review and merge the code for https://jira.stsci.edu/browse/JP-2943.

 

stscijgbot-jp avatar Mar 08 '24 15:03 stscijgbot-jp

Comment by Maria Pena-Guerrero on JIRA:

Is this still happening? And if so, the outlier_i2d.fits file is saved in the working directory, is the pipeline being run in a different directory where the input data lives?

stscijgbot-jp avatar Apr 23 '24 17:04 stscijgbot-jp

Comment by Maria Pena-Guerrero on JIRA:

We expect that PR #8418 fixes this issue. We will wait until those changes are tested and if the problem still exists we will re-open this ticket.

stscijgbot-jp avatar Apr 24 '24 13:04 stscijgbot-jp

Comment by Maria Pena-Guerrero on JIRA:

We expect that PR #8418 fixes this issue. We will wait until those changes are tested and if the problem still exists we will re-open this ticket.

stscijgbot-jp avatar Apr 24 '24 13:04 stscijgbot-jp