f3dasm
f3dasm copied to clipboard
Failure to wait for the data_object creation
When I run a DOE with f3dasm, sometimes, a few nodes produce the following error and quit.
Error executing job with overrides: ['++hpc.jobid=4', 'hp_tune.model=baseline', 'hp_tune.model_seed=-1']
Traceback (most recent call last):
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/experimentdata.py", line 271, in _from_file_attempt
domain = Domain.from_file(Path(f"{filename}_domain"))
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/domain.py", line 71, in from_file
raise FileNotFoundError(f"Domain file {filename} does not exist.")
FileNotFoundError: Domain file exp_data_baseline_domain does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/experimentdata.py", line 145, in from_file
return cls._from_file_attempt(filename, text_io)
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/experimentdata.py", line 283, in _from_file_attempt
raise FileNotFoundError(f"Cannot find the file {filename}_data.csv.")
FileNotFoundError: Cannot find the file exp_data_baseline_data.csv.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/experimentdata.py", line 271, in _from_file_attempt
domain = Domain.from_file(Path(f"{filename}_domain"))
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/domain.py", line 71, in from_file
raise FileNotFoundError(f"Domain file {filename} does not exist.")
FileNotFoundError: Domain file /gpfs/home5/sanusm/phd/TO-JAX/experiments/benchmarking/hp_tuning_b/exp_data_baseline_domain does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/gpfs/home5/sanusm/phd/TO-JAX/experiments/benchmarking/hp_tuning_b/main.py", line 80, in main_func
process(config)
File "/gpfs/home5/sanusm/phd/TO-JAX/experiments/benchmarking/hp_tuning_b/main.py", line 62, in process
data = f3dasm.ExperimentData.from_file(filename='exp_data_{}'.format(
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/experimentdata.py", line 152, in from_file
return cls._from_file_attempt(filename_with_path, text_io)
File "/home/sanusm/.conda/envs/to_jax_env/lib/python3.9/site-packages/f3dasm/design/experimentdata.py", line 283, in _from_file_attempt
raise FileNotFoundError(f"Cannot find the file {filename}_data.csv.")
FileNotFoundError: Cannot find the file /gpfs/home5/sanusm/phd/TO-JAX/experiments/benchmarking/hp_tuning_b/exp_data_baseline_data.csv.
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I am using version 1.3.0.