ert
ert copied to clipboard
Realization marked as failed, but all fm steps completed (2024.04)
Ran a drogon case, and on iteration 3; four (4) realizations were marked as failed in the GUI, but all forward model steps were marked as success and OK file was written. The following error message was found in the logs:
status from done callback: Error reading GEN_DATA: R_A2_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A2_1']
Error reading GEN_DATA: R_A3_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A3_1']
Error reading GEN_DATA: R_A4_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A4_1']
Error reading GEN_DATA: R_A5_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A5_1']
Error reading GEN_DATA: R_A6_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A6_1']
Error reading GEN_DATA: TRACER_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/tracer/drogon_tracer_sim_1.txt']
Error reading GEN_DATA: AMP_2020_2018_TOP, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/share/results/points/topvolantis_amplitude_mean_20200701_20180101_1.txt']
Error reading GEN_DATA: AMP_2020_2018_BASE, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/share/results/points/basevolantis_amplitude_mean_20200701_20180101_1.txt']
ERROR Realization: 60 failed after reaching max submit (1):
To reproduce Steps to reproduce the behaviour:
-
pip install ert
-
ert gui my_config.ert
- Run experiment (IES/Smoother/ESMDA/Test)
- …
Expected behaviour A clear and concise description of what you expected to happen.
Environment
- OS: [ RHEL7]
- ERT/Komodo release: [2024.04]
- Python version
- Remote/HPC execution involved: [yes]
Error seen in ert-internal examples as well on building the 2024.04.04 release: https://github.com/equinor/komodo-releases/actions/runs/8755056479/job/24055171964
We need to check if the file is on disk, and if it is we need to reconsider if there should be a slight wait in the callback to allow disk synchronisation.
In the case of ert-internal-examples the file does seem to be on disk:
cat RFT_RWI_3_1
271.8949890136719
268.4920349121094
275.8153991699219
Would be solved by #7788