mx-DeepIM NaN values in Predictions

After following the instructions in the latest commit and then running the train_and_test_deepim_all.sh I got the following error:

Traceback (most recent call last): File "experiments/deepim/deepim_train_test.py", line 20, in train.main() File "experiments/deepim/../../deepim/train.py", line 287, in main config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step) File "experiments/deepim/../../deepim/train.py", line 280, in train_net prefix=prefix) File "experiments/deepim/../../deepim/core/module.py", line 1026, in fit data_batch = interBatchUpdater.forward(data_batch, preds, config) File "experiments/deepim/../../lib/pair_matching/batch_updater_py_multi.py", line 231, in forward rot_type='QUAT') File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 34, in calc_RT_delta r = mat2quat(Rm_delta) File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 459, in mat2quat vals, vecs = np.linalg.eigh(K) File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1410, in eigh w, vt = gufunc(a, signature=signature, extobj=extobj) File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 95, in _raise_linalgerror_eigenvalues_nonconvergence raise LinAlgError("Eigenvalues did not converge") numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge

Looks like the predicted poses are all NaN values. I printed the rotation and translation predicted:

[array([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan]], dtype=float32)] [array([[nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan]], dtype=float32)]

Has anybody successfully trained the network for LINEMOD or OCCLUSION datasets?

Oct 11 '18 14:10 nhafez

I executed train_and_test_deepim_all.sh and doesn't found such error. Can you provide more information, like the context of this error? Does it fail in the first iteration or after a few iterations?

Oct 12 '18 02:10 liyi14

For the first batch it completes one iteration and on the next one this error happens because the predictions are NaN

Oct 12 '18 09:10 nhafez

Can you change the frequent in the experiments/deepim/cfg/*_any/all.yaml(abbreviate as config below)->default to 1 and tell me the result running such modifications separately:

rerun using the train_test_deepim_all.sh
run the train_test_deepim_ape.yaml
change train_iter_size in config->network to 1 and run any config reporting error before
replace dataset: LM6D_REFINE+LM6D_REFINE_SYN to dataset: LM6D_REFINE and image_set: train_+train_ to image_set: train_
change the config->TRAIN->warmup_lr to 0.0 Tell me what happened after applying such modifications, thank you.

Oct 13 '18 00:10 liyi14

mx-DeepIM mx-DeepIM copied to clipboard

NaN values in Predictions

mx-DeepIM
mx-DeepIM copied to clipboard