neuraltexture
neuraltexture copied to clipboard
Training Fails on Validation Sanity Check
Hi, When I try to train on a new dataset, it fails with the following error.
[PYTHON_ENV_PATH]/neuraltexture/bin/python -u [PROJECT_ROOT]/code/train_neural_texture.py
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Use pytorch 1.4.0
Load config: configs/neural_texture/config_default.yaml
INFO:lightning:GPU available: True, used: True
INFO:lightning:CUDA_VISIBLE_DEVICES: [0]
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:23: RuntimeWarning: You have defined a `val_dataloader()` and have defined a `validation_step()`, you may also want to define `validation_epoch_end()` for accumulating stats.
warnings.warn(*args, **kwargs)
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:23: RuntimeWarning: You have defined a `test_dataloader()` and have defined a `test_step()`, you may also want to define `test_epoch_end()` for accumulating stats.
warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
File "[PROJECT_ROOT]/neuraltexture/code/train_neural_texture.py", line 47, in <module>
trainer.fit(system)
File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 765, in fit
self.single_gpu_train(model)
File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 492, in single_gpu_train
self.run_pretrain_routine(model)
File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 896, in run_pretrain_routine
eval_results = self._evaluate(model,
File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 322, in _evaluate
eval_results = model.validation_end(outputs)
File "[PROJECT_ROOT]/neuraltexture/code/systems/s_core.py", line 33, in validation_end
for key in outputs[0].keys():
IndexError: list index out of range
Process finished with exit code 1
Additional Information
- My dataset: As a sanity check, I use all the test images provided by you as my dataset. Thus, I have a folder called "all" in the "datasets" directory which has two sub-directories "train" and "test". I have copied all the test images provided by you into both of these directories.
- The working directory is "[PROJECT_ROOT]/code".
- My Operating System is Ubuntu 16.04.
- PyTorch Lightning 0.7.5 is installed.
My "config_default.yml" Is shown below:
version_name: neuraltexture_all_2d_single
device: cuda
n_workers: 8
n_gpus: 1
dim: 2
noise:
octaves: 8
logger:
log_files_every_n_iter: 1000
log_scalars_every_n_iter: 100
log_validation_every_n_epochs: 1
image:
image_res: &image_res 128 # (height, width)
texture:
e: &texture_e 64 # encoding size
dataset:
name: datasets.images
path: '../datasets/all'
use_single: -1 # -1 = all, 0,1,2 for single
system:
block_main:
model_texture_encoder:
model_params:
name: models.neural_texture.encoder
type: 'ResNet'
shape_in: [[3, *image_res, *image_res]]
bottleneck_size: 8
model_texture_mlp:
model_params:
name: models.neural_texture.mlp
type: 'MLP'
n_max_features: 128
n_blocks: 4
dropout_ratio: 0.0
non_linearity: 'relu'
bias: True
encoding: *texture_e
optimizer_params:
name: 'adam'
lr: 0.0001
weight_decay: 0.0001
scheduler_params:
name: 'none'
loss_params:
style_weight: 1.0
style_type: 'mse'
train:
epochs: 3
bs: 16
accumulate_grad_batches: 1
seed: 41127
Your help is much appreciated.
Had the same issue, tweaked the code a bit to:
if len(outputs)>0:
for key in outputs[0].keys():
logs[key] = torch.stack([x[key] for x in outputs]).mean()
else:
logs['val_loss']=torch.tensor(0.)
This is very ad hoc, i think the code needs a 'val' folder as well as a train and test