mx-DeepIM
mx-DeepIM copied to clipboard
NaN values in Predictions
After following the instructions in the latest commit and then running the train_and_test_deepim_all.sh I got the following error:
Traceback (most recent call last): File "experiments/deepim/deepim_train_test.py", line 20, in
train.main() File "experiments/deepim/../../deepim/train.py", line 287, in main config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step) File "experiments/deepim/../../deepim/train.py", line 280, in train_net prefix=prefix) File "experiments/deepim/../../deepim/core/module.py", line 1026, in fit data_batch = interBatchUpdater.forward(data_batch, preds, config) File "experiments/deepim/../../lib/pair_matching/batch_updater_py_multi.py", line 231, in forward rot_type='QUAT') File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 34, in calc_RT_delta r = mat2quat(Rm_delta) File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 459, in mat2quat vals, vecs = np.linalg.eigh(K) File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1410, in eigh w, vt = gufunc(a, signature=signature, extobj=extobj) File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 95, in _raise_linalgerror_eigenvalues_nonconvergence raise LinAlgError("Eigenvalues did not converge") numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge
Looks like the predicted poses are all NaN values. I printed the rotation and translation predicted:
[array([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan]], dtype=float32)] [array([[nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan]], dtype=float32)]
Has anybody successfully trained the network for LINEMOD or OCCLUSION datasets?
I executed train_and_test_deepim_all.sh and doesn't found such error. Can you provide more information, like the context of this error? Does it fail in the first iteration or after a few iterations?
For the first batch it completes one iteration and on the next one this error happens because the predictions are NaN
Can you change the frequent
in the experiments/deepim/cfg/*_any/all.yaml
(abbreviate as config below)->default
to 1 and tell me the result running such modifications separately:
- rerun using the
train_test_deepim_all.sh
- run the
train_test_deepim_ape.yaml
- change
train_iter_size
in config->network
to 1 and run any config reporting error before - replace
dataset: LM6D_REFINE+LM6D_REFINE_SYN
todataset: LM6D_REFINE
andimage_set: train_+train_
toimage_set: train_
- change the config->
TRAIN
->warmup_lr
to 0.0 Tell me what happened after applying such modifications, thank you.