ert icon indicating copy to clipboard operation
ert copied to clipboard

Realization marked as failed, but all fm steps completed (2024.04)

Open larsevj opened this issue 10 months ago • 4 comments

Ran a drogon case, and on iteration 3; four (4) realizations were marked as failed in the GUI, but all forward model steps were marked as success and OK file was written. The following error message was found in the logs:

status from done callback: Error reading GEN_DATA: R_A2_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A2_1']
Error reading GEN_DATA: R_A3_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A3_1']
Error reading GEN_DATA: R_A4_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A4_1']
Error reading GEN_DATA: R_A5_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A5_1']
Error reading GEN_DATA: R_A6_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A6_1']
Error reading GEN_DATA: TRACER_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/tracer/drogon_tracer_sim_1.txt']
Error reading GEN_DATA: AMP_2020_2018_TOP, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/share/results/points/topvolantis_amplitude_mean_20200701_20180101_1.txt']
Error reading GEN_DATA: AMP_2020_2018_BASE, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/share/results/points/basevolantis_amplitude_mean_20200701_20180101_1.txt']
ERROR    Realization: 60 failed after reaching max submit (1):

To reproduce Steps to reproduce the behaviour:

  1. pip install ert
  2. ert gui my_config.ert
  3. Run experiment (IES/Smoother/ESMDA/Test)

Expected behaviour A clear and concise description of what you expected to happen.

Environment

  • OS: [ RHEL7]
  • ERT/Komodo release: [2024.04]
  • Python version
  • Remote/HPC execution involved: [yes]

larsevj avatar Apr 19 '24 17:04 larsevj

Error seen in ert-internal examples as well on building the 2024.04.04 release: https://github.com/equinor/komodo-releases/actions/runs/8755056479/job/24055171964

larsevj avatar Apr 20 '24 12:04 larsevj

We need to check if the file is on disk, and if it is we need to reconsider if there should be a slight wait in the callback to allow disk synchronisation.

sondreso avatar Apr 22 '24 07:04 sondreso

In the case of ert-internal-examples the file does seem to be on disk:

cat RFT_RWI_3_1
271.8949890136719
268.4920349121094
275.8153991699219

larsevj avatar Apr 22 '24 08:04 larsevj

Would be solved by #7788

eivindjahren avatar Apr 29 '24 09:04 eivindjahren