PDEBench
PDEBench copied to clipboard
Darcy Flow Config Issues
I have run into two issues with darcy flow's config files:
-
There are two config files config_darcy.yaml and args/config_Darcy.yaml. The documentation points to
config_Darcy.yaml
(capital D), butconfig_darcy.yaml
(lowercase d) seems newer and more correct...? Should this be updated to fully replace the old one? -
config_darcy.yaml
works with FNO, but has an error with Unet. By default, config_darcy setsinitial_step=1
andt_train=1
. I believe this is an error because the AR loops (here and here) go frominitial_step
tot_train
, so it ends up not doing anything, since the range ends up being empty. This actually produces a confusing error, since theloss
is initialized as a pythonint
. Since the loop is empty, nothing is added ontoloss
, so it stays as an int:
Unet
Epochs = 500, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
Spatial Dimension 2
Total parameters = 7762465
start training...
Error executing job with overrides: []
Traceback (most recent call last):
File "/dfs6/pub/afeeney/opensource/PDEBench/pdebench/models/train_models_forward.py", line 199, in main
run_training_Unet(
File "/data/homezvol2/afeeney/.conda/envs/pdebench/lib/python3.10/site-packages/pdebench/models/unet/train.py", line 414, in run_training
train_l2_step += loss.item()
AttributeError: 'int' object has no attribute 'item'
I was able to get it running by setting t_train=2
. I don't totally follow how the Darcy stuff is setup, so I'm not sure if that's a correct fix though...
Have you successfully fixed this bug? For both FNO and Unet, I can't successfully run, and encountered the same problem as you. The dataset used comes from folder data_download.
FNO
FNO
Epochs = 30, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
FNODatasetSingle
/home/dp/miniconda3/envs/pdebench/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /home/builder/cbouss/pytorch/croot/pytorch_1685629640362/work/aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Spatial Dimension 2
Total parameters = 465557
Error executing job with overrides: ['+args=config_Darcy.yaml', '++args.filename=2D_DarcyFlow_beta10.0_Train.hdf5', '++args.model_name=FNO']
Traceback (most recent call last):
File "/home/dp/PDEBench/pdebench/models/train_models_forward.py", line 166, in main
run_training_FNO(
File "/home/dp/miniconda3/envs/pdebench/lib/python3.9/site-packages/pdebench/models/fno/train.py", line 227, in run_training
train_l2_step += loss.item()
AttributeError: 'int' object has no attribute 'item'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Unet
Unet
Epochs = 30, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
Spatial Dimension 2
Total parameters = 7765057
start training...
Error executing job with overrides: ['+args=config_Darcy.yaml', '++args.filename=2D_DarcyFlow_beta10.0_Train.hdf5', '++args.model_name=Unet']
Traceback (most recent call last):
File "/home/dp/PDEBench/pdebench/models/train_models_forward.py", line 200, in main
run_training_Unet(
File "/home/dp/miniconda3/envs/pdebench/lib/python3.9/site-packages/pdebench/models/unet/train.py", line 414, in run_training
train_l2_step += loss.item()
AttributeError: 'int' object has no attribute 'item'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.