tfoptflow
tfoptflow copied to clipboard
Bad performance on MPI-Sintel
Hi, I have used your pretrained model to finetune on MPI-Sintel. The EPE on test set was 6.2. Have you tried it?
To fine-tune on the MPI-Sintel dataset you have to change the dataset options. If you found respective in:
[1] Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. "PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume." CVPR 2018 or arXiv:1709.02371](https://arxiv.org/abs/1709.02371)
and set them to:
ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS)
ds_opts['in_memory'] = False
ds_opts['aug_type'] = 'heavy'
ds_opts['flipud'] = 0 # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['translate'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['scale'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['batch_size'] = batch_size * len(gpu_devices)
ds_opts['crop_preproc'] = (384, 768) # Crop to described in [1]
ds_opts['batch_size'] = 4
and
# Robust loss as described doesn't work, so try the following:
nn_opts['loss_fn'] = 'loss_multiscale'
nn_opts['q'] = 0.4 # see[1]
nn_opts['epsilon'] = 0.01 # see[1]
By fine-tuning on clean and final and evaluating on the training data I got:
- clean 1.4 EPE
- final 1.88 EPE
however the results on the test data compared to the reported original are quite low:
- clean 5.13 (Place 83) in contrast to 4.37 of the original
- final 6.50 (Place 77) in contrast to 5.04 of the original
I have used the lg-6-2 Net. Could this be an issue of over-fitting? I would appreciate any help to get better results on the test data.
I think the difference above can be listed as follows: 1.you should take care of the choices of validation set,see https://github.com/lmb-freiburg/flownet2/issues?utf8=%E2%9C%93&q=320
2.data augmentations used in the code have a little different in the original flownet paper,see https://github.com/philferriere/tfoptflow/issues/10 .When training in chairs,you should add that.
Thanks I will take a try but you mentioned flownet2. I want to replicate the pwc-net results.
Did you replicate the results successfully?
Do you mean for flownet2 or pwc-net
pwc-net
Unless the one reported abobe, I don't have done any further experiments.
Thank you so much!
@tsenst Hi, I also have this problem. do you find the reason and the corresponding solution?
@tsenst Hi, when I finetune the model on MPI-Sintel with your options The loss and epe are all 'nan' Did you meet this problem?
@tsenst Hi, I also have this problem. do you find the reason and the corresponding solution?
Hi~ Have you solved the problems ?
@tsenst Hi, I also have this problem. do you find the reason and the corresponding solution?
Hi~ Have you solved the problems?
No solution, probably because of the data augmentation.
@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan'
like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan
The fine-tune code is
from __future__ import absolute_import, division, print_function
import sys
from copy import deepcopy
from dataset_base import _DEFAULT_DS_TUNE_OPTIONS
from dataset_flyingchairs import FlyingChairsDataset
from dataset_flyingthings3d import FlyingThings3DHalfResDataset
from dataset_mixer import MixedDataset
from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS
from dataset_mpisintel import MPISintelDataset
# TODO: You MUST set dataset_root to the correct path on your machine!
_DATASET_ROOT = '/home/zyy/opticalflow/data/'
_MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel'
gpu_devices = ['/device:GPU:0']
controller = '/device:GPU:0'
# TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s)
# Batch size
batch_size = 8
# TODO: You MUST set the batch size based on the capabilities of your GPU(s)
# Load train dataset
ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS)
ds_opts['in_memory'] = False # Too many samples to keep in memory at once, so don't preload them
ds_opts['aug_type'] = 'heavy' # Apply all supported augmentations
ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti)
ds_opts['crop_preproc'] = (384,768) #(256, 448) # Crop to a smaller input size
ds_opts['train_mode'] = 'fine-tune'
#ds_opts['crop_preproc'] = None
ds_opts['type'] = 'final'
ds_opts['flipud'] = 0 # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['translate'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['scale'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1]
ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts)
# Display dataset configuration
ds.print_config()
# Start from the default options
nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS)
nn_opts['verbose'] = True
nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000'
nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal'
nn_opts['batch_size'] = ds_opts['batch_size']
nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3]
nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2]
nn_opts['use_tf_data'] = True # Use tf.data reader
nn_opts['gpu_devices'] = gpu_devices
nn_opts['controller'] = controller
nn_opts['train_mode'] = 'fine-tune'
#
# # Use the PWC-Net-small model in quarter-resolution mode
nn_opts['use_dense_cx'] = False
nn_opts['use_res_cx'] = False
nn_opts['pyr_lvls'] = 6
nn_opts['flow_pred_lvl'] = 2
#
# # Robust loss as described doesn't work, so try the following:
nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust'
nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1.
nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0.
# Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8.
# Below,we adjust the schedule to the size of the batch and the number of GPUs.
nn_opts['lr_policy'] = 'multisteps'
nn_opts['init_lr'] = 1e-05
nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000]
nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07]
nn_opts['max_steps'] = 200000
# Below,we adjust the schedule to the size of the batch and our number of GPUs (2).
nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size'])
nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size'])
# Instantiate the model and display the model configuration
nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds)
nn.print_config()
# Train the model
nn.train()
Have you ever met this problem?
@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan' like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan
The fine-tune code is
from __future__ import absolute_import, division, print_function import sys from copy import deepcopy from dataset_base import _DEFAULT_DS_TUNE_OPTIONS from dataset_flyingchairs import FlyingChairsDataset from dataset_flyingthings3d import FlyingThings3DHalfResDataset from dataset_mixer import MixedDataset from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS from dataset_mpisintel import MPISintelDataset # TODO: You MUST set dataset_root to the correct path on your machine! _DATASET_ROOT = '/home/zyy/opticalflow/data/' _MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel' gpu_devices = ['/device:GPU:0'] controller = '/device:GPU:0' # TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s) # Batch size batch_size = 8 # TODO: You MUST set the batch size based on the capabilities of your GPU(s) # Load train dataset ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS) ds_opts['in_memory'] = False # Too many samples to keep in memory at once, so don't preload them ds_opts['aug_type'] = 'heavy' # Apply all supported augmentations ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti) ds_opts['crop_preproc'] = (384,768) #(256, 448) # Crop to a smaller input size ds_opts['train_mode'] = 'fine-tune' #ds_opts['crop_preproc'] = None ds_opts['type'] = 'final' ds_opts['flipud'] = 0 # Only apply horizontal flipping for data augmentation, see [1] ds_opts['translate'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1] ds_opts['scale'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1] ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts) # Display dataset configuration ds.print_config() # Start from the default options nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS) nn_opts['verbose'] = True nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000' nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal' nn_opts['batch_size'] = ds_opts['batch_size'] nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3] nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2] nn_opts['use_tf_data'] = True # Use tf.data reader nn_opts['gpu_devices'] = gpu_devices nn_opts['controller'] = controller nn_opts['train_mode'] = 'fine-tune' # # # Use the PWC-Net-small model in quarter-resolution mode nn_opts['use_dense_cx'] = False nn_opts['use_res_cx'] = False nn_opts['pyr_lvls'] = 6 nn_opts['flow_pred_lvl'] = 2 # # # Robust loss as described doesn't work, so try the following: nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust' nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1. nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0. # Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8. # Below,we adjust the schedule to the size of the batch and the number of GPUs. nn_opts['lr_policy'] = 'multisteps' nn_opts['init_lr'] = 1e-05 nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000] nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07] nn_opts['max_steps'] = 200000 # Below,we adjust the schedule to the size of the batch and our number of GPUs (2). nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size']) nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size']) # Instantiate the model and display the model configuration nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds) nn.print_config() # Train the model nn.train()
Have you ever met this problem?
Hi~ I haven't tried to finetune on MIP-sintel, but maybe this link (https://github.com/philferriere/tfoptflow/issues/7) is helpful for you. Maybe you can try it.
@Blcony by any chance did you manage to implement this (#7) solution? Could you post here the code? I think it should be added between line 549 and 553 of model_pwcnet
Thanks, Stefano
@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan' like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan
The fine-tune code isfrom __future__ import absolute_import, division, print_function import sys from copy import deepcopy from dataset_base import _DEFAULT_DS_TUNE_OPTIONS from dataset_flyingchairs import FlyingChairsDataset from dataset_flyingthings3d import FlyingThings3DHalfResDataset from dataset_mixer import MixedDataset from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS from dataset_mpisintel import MPISintelDataset # TODO: You MUST set dataset_root to the correct path on your machine! _DATASET_ROOT = '/home/zyy/opticalflow/data/' _MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel' gpu_devices = ['/device:GPU:0'] controller = '/device:GPU:0' # TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s) # Batch size batch_size = 8 # TODO: You MUST set the batch size based on the capabilities of your GPU(s) # Load train dataset ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS) ds_opts['in_memory'] = False # Too many samples to keep in memory at once, so don't preload them ds_opts['aug_type'] = 'heavy' # Apply all supported augmentations ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti) ds_opts['crop_preproc'] = (384,768) #(256, 448) # Crop to a smaller input size ds_opts['train_mode'] = 'fine-tune' #ds_opts['crop_preproc'] = None ds_opts['type'] = 'final' ds_opts['flipud'] = 0 # Only apply horizontal flipping for data augmentation, see [1] ds_opts['translate'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1] ds_opts['scale'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1] ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts) # Display dataset configuration ds.print_config() # Start from the default options nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS) nn_opts['verbose'] = True nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000' nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal' nn_opts['batch_size'] = ds_opts['batch_size'] nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3] nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2] nn_opts['use_tf_data'] = True # Use tf.data reader nn_opts['gpu_devices'] = gpu_devices nn_opts['controller'] = controller nn_opts['train_mode'] = 'fine-tune' # # # Use the PWC-Net-small model in quarter-resolution mode nn_opts['use_dense_cx'] = False nn_opts['use_res_cx'] = False nn_opts['pyr_lvls'] = 6 nn_opts['flow_pred_lvl'] = 2 # # # Robust loss as described doesn't work, so try the following: nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust' nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1. nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0. # Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8. # Below,we adjust the schedule to the size of the batch and the number of GPUs. nn_opts['lr_policy'] = 'multisteps' nn_opts['init_lr'] = 1e-05 nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000] nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07] nn_opts['max_steps'] = 200000 # Below,we adjust the schedule to the size of the batch and our number of GPUs (2). nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size']) nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size']) # Instantiate the model and display the model configuration nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds) nn.print_config() # Train the model nn.train()
Have you ever met this problem?
Hi~ I haven't tried to finetune on MIP-sintel, but maybe this link (#7) is helpful for you. Maybe you can try it.
Well, maybe that issue does not solve my problem. I encounter this problem as early as 200 iteration。 Like this
Start finetuning...
2019-09-17 22:40:18 Iter 100 [Train]: loss=3.39, epe=4.67, lr=0.000010, samples/sec=3.7, sec/step=1.081, eta=5 days, 0:05:35
2019-09-17 22:41:33 Iter 200 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.6, sec/step=0.710, eta=3 days, 6:48:41
2019-09-17 22:42:37 Iter 300 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=6.8, sec/step=0.590, eta=2 days, 17:29:28
2019-09-17 22:43:51 Iter 400 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.7, sec/step=0.703, eta=3 days, 5:59:56
Thank you very much, Maybe I need to open a new issue.
@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan' like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan
The fine-tune code isfrom __future__ import absolute_import, division, print_function import sys from copy import deepcopy from dataset_base import _DEFAULT_DS_TUNE_OPTIONS from dataset_flyingchairs import FlyingChairsDataset from dataset_flyingthings3d import FlyingThings3DHalfResDataset from dataset_mixer import MixedDataset from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS from dataset_mpisintel import MPISintelDataset # TODO: You MUST set dataset_root to the correct path on your machine! _DATASET_ROOT = '/home/zyy/opticalflow/data/' _MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel' gpu_devices = ['/device:GPU:0'] controller = '/device:GPU:0' # TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s) # Batch size batch_size = 8 # TODO: You MUST set the batch size based on the capabilities of your GPU(s) # Load train dataset ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS) ds_opts['in_memory'] = False # Too many samples to keep in memory at once, so don't preload them ds_opts['aug_type'] = 'heavy' # Apply all supported augmentations ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti) ds_opts['crop_preproc'] = (384,768) #(256, 448) # Crop to a smaller input size ds_opts['train_mode'] = 'fine-tune' #ds_opts['crop_preproc'] = None ds_opts['type'] = 'final' ds_opts['flipud'] = 0 # Only apply horizontal flipping for data augmentation, see [1] ds_opts['translate'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1] ds_opts['scale'] = (0,0) # Only apply horizontal flipping for data augmentation, see [1] ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts) # Display dataset configuration ds.print_config() # Start from the default options nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS) nn_opts['verbose'] = True nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000' nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal' nn_opts['batch_size'] = ds_opts['batch_size'] nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3] nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2] nn_opts['use_tf_data'] = True # Use tf.data reader nn_opts['gpu_devices'] = gpu_devices nn_opts['controller'] = controller nn_opts['train_mode'] = 'fine-tune' # # # Use the PWC-Net-small model in quarter-resolution mode nn_opts['use_dense_cx'] = False nn_opts['use_res_cx'] = False nn_opts['pyr_lvls'] = 6 nn_opts['flow_pred_lvl'] = 2 # # # Robust loss as described doesn't work, so try the following: nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust' nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1. nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0. # Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8. # Below,we adjust the schedule to the size of the batch and the number of GPUs. nn_opts['lr_policy'] = 'multisteps' nn_opts['init_lr'] = 1e-05 nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000] nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07] nn_opts['max_steps'] = 200000 # Below,we adjust the schedule to the size of the batch and our number of GPUs (2). nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size']) nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size']) # Instantiate the model and display the model configuration nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds) nn.print_config() # Train the model nn.train()
Have you ever met this problem?
Hi~ I haven't tried to finetune on MIP-sintel, but maybe this link (#7) is helpful for you. Maybe you can try it.
Well, maybe that issue does not solve my problem. I encounter this problem as early as 200 iteration。 Like this
Start finetuning... 2019-09-17 22:40:18 Iter 100 [Train]: loss=3.39, epe=4.67, lr=0.000010, samples/sec=3.7, sec/step=1.081, eta=5 days, 0:05:35 2019-09-17 22:41:33 Iter 200 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.6, sec/step=0.710, eta=3 days, 6:48:41 2019-09-17 22:42:37 Iter 300 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=6.8, sec/step=0.590, eta=2 days, 17:29:28 2019-09-17 22:43:51 Iter 400 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.7, sec/step=0.703, eta=3 days, 5:59:56
Thank you very much, Maybe I need to open a new issue.
Hi, I have meet the same situation, Moreover,this nan. stuff is not only appear during finetuning, but aslo pretraining using Chairs_Things_mix. Did you find the solution?
When I trained the model with a RTX 3090 + TF1.15, I got nan at first steps (global step 1, 2, etc). I found TF1.x do not supports RTX3090, TF1.15.x use CUDA 10.0, this configuration reports no errors but results in nan loss(even NaN values in feature maps from feature_estimator layer). I fixed this by reinstalling TF 1.15 with Nvidia-tensorflow. see https://github.com/nvidia/tensorflow.