robot-surgery-segmentation
robot-surgery-segmentation copied to clipboard
Program failed to train , I am using one GPU to run the program
num train = 0, num_val = 0
Traceback (most recent call last):
File "train.py", line 157, in
First of all
num train = 0, num_val = 0
looks strange. Are you sure that your DataLoader defined in https://github.com/ternaus/robot-surgery-segmentation/blob/master/dataset.py is correct?
Second
model.load_state_dict(state['model'])
is trying to load a model which is happening when your folder runs/debug
is not empty.
Can you delete it and try again?
Second
model.load_state_dict(state['model'])
is trying to load a model which is happening when your folderruns/debug
is not empty.Can you delete it and try again?
Yes, I had deleted the "runs/debug" folder and tried agian. Now it solved the "RuntimeError: Error(s) in loading state_dict for DataParallel" problem but still "num train = 0, num_val = 0"
python prepare_train_val.py python train.py --device-ids 0 --batch-size 16 --fold $3 --workers 12 --lr 0.00001 --n-epochs 20 --type binary --jaccard-weight 1 --model UNet16
Log: num train = 0, num_val = 0 Epoch 1, lr 1e-05: : 0it [00:00, ?it/s] /usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py:2957: RuntimeWarning: Mean of empty slice. out=out, **kwargs) /usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Valid loss: nan, jaccard: nan Epoch 2, lr 1e-05: : 0it [00:00, ?it/s] Valid loss: nan, jaccard: nan Epoch 3, lr 1e-05: : 0it [00:00, ?it/s] Valid loss: nan, jaccard: nan
First of all
num train = 0, num_val = 0
looks strange. Are you sure that your DataLoader defined in https://github.com/ternaus/robot-surgery-segmentation/blob/master/dataset.py is correct?
And my folder arrangements are: surgery/data/models/ surgery/data/train/instrument_dataset_1 surgery/data/test/instrument_dataset_1 surgery/data/cropped_train/instrument_dataset_1 surgery/data/train.py surgery/data/model.py surgery/data/prepare_data.py surgery/data/prepare_train_val.py surgery/data/dataset.py
Can you give me the DATASET from the surgery/data/train/instrument_dataset_1 and surgery/data/test/instrument_dataset_1?
So for anyone encountering this error - check if you changed the problem type:
model = get_model(model_path, model_type='UNet11', problem_type='instruments')
Can you give me the DATASET from the surgery/data/train/instrument_dataset_1 and surgery/data/test/instrument_dataset_1?
https://github.com/ternaus/robot-surgery-segmentation/issues/3#issuecomment-384948063 you might find this link useful.