smplify-x
smplify-x copied to clipboard
NaN loss value, stopping! --> File "/home/mona/research/code/smplify-x/smplifyx/fit_single_frame.py", line 366, in fit_single_frame tqdm.write('Camera initialization final loss {:.4f}'.format( TypeError: unsupported format string passed to NoneType.__format__
After processing some images successfully, I got this error. Is there a fix for it?
Processing: ../../data/smplify-x/djrn_test_data/images/HICO_test2015_00000469.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.7507
Camera initialization final loss 1283.9620
Stage 000 done after 2.0166 seconds
Stage 001 done after 1.8066 seconds
Stage 002 done after 1.7855 seconds
Stage 003 done after 6.2607 seconds
Stage 004 done after 6.4354 seconds
Stage: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:18<00:00, 3.66s/it]
Body fitting Orientation 0 done after 18.3112 seconds
Body final loss val = 7074.32373
Orientation: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.31s/it]
Processing: ../../data/smplify-x/djrn_test_data/images/HICO_test2015_00000470.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.9054
Camera initialization final loss 805.9324
Stage 000 done after 2.0088 seconds
Stage 001 done after 0.6831 seconds
Stage 002 done after 2.9054 seconds
Stage 003 done after 7.4240 seconds
Stage 004 done after 1.3566 seconds
Stage: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:14<00:00, 2.88s/it]
Body fitting Orientation 0 done after 14.3841 seconds
Body final loss val = 1076.25085
Orientation: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00, 14.38s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.8964
Camera initialization final loss 10146.6113
Stage 000 done after 2.3258 seconds
Stage 001 done after 1.5916 seconds
Stage 002 done after 3.7246 seconds
Stage 003 done after 13.3136 seconds
Stage 004 done after 3.2600 seconds
Stage: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:24<00:00, 4.84s/it]
Body fitting Orientation 0 done after 24.2220 seconds
Body final loss val = 196141.98438
Orientation: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:24<00:00, 24.22s/it]
Processing: ../../data/smplify-x/djrn_test_data/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
NaN loss value, stopping!
Camera initialization done after 1.2850
Traceback (most recent call last):
File "smplifyx/main.py", line 272, in <module>
main(**args)
File "smplifyx/main.py", line 245, in main
fit_single_frame(img, keypoints[[person_id]],
File "/home/mona/research/code/smplify-x/smplifyx/fit_single_frame.py", line 366, in fit_single_frame
tqdm.write('Camera initialization final loss {:.4f}'.format(
TypeError: unsupported format string passed to NoneType.__format__
https://github.com/DirtyHarryLYL/DJ-RN/issues/28
@vchoutas Do you know why?
So, I was very curious and decided to run the SMPLify-X only for one of the images that we got a NAN loss for and I was able to produce meshes using SMPLify-X.
So I am very confused why do I get a NAN error when it is in a folder along with other images
Here's another example: When this image was along with >6K images in a folder, I got a NAN error.
[2440:2429 0:2010] 09:34:34 Wed Jan 13 [mona@goku:pts/0 +1] ~/research/code/smplify-x
$ ./bad_image_fit2.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER2/images/HICO_test2015_00000470.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 0.8730
Camera initialization final loss 732.2143
Stage 000 done after 1.7267 seconds
Stage 001 done after 1.2796 seconds
Stage 002 done after 2.4069 seconds
Stage 003 done after 9.2062 seconds
Stage 004 done after 1.0497 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:15<00:00, 3.13s/it]
Body fitting Orientation 0 done after 15.6752 seconds
Body final loss val = 1005.60815
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.68s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.9409
Camera initialization final loss 9928.3438
Stage 000 done after 2.7342 seconds
Stage 001 done after 1.5152 seconds
Stage 002 done after 2.9879 seconds
Stage 003 done after 25.6505 seconds
Stage 004 done after 3.0541 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:35<00:00, 7.19s/it]
Body fitting Orientation 0 done after 35.9483 seconds
Body final loss val = 148578.62500
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:35<00:00, 35.95s/it]
Processing the data took: 00 hours, 00 minutes, 57 seconds
22473/31772MB(smplifyx)
[2440:2429 0:2011] 09:35:39 Wed Jan 13 [mona@goku:pts/0 +1] ~/research/code/smplify-x
$ cat bad_image_fit2.sh
export CUDA_VISIBLE_DEVICES=0
python smplifyx/main.py --config cfg_files/fit_smplx.yaml --data_folder ../../data/smplify-x/BAD_DATA_FOLDER2 --output_folder ../../data/smplify-x/BAD_RESULTS2 --visualize="False" --model_folder ../../data/smplify-x/models_smplx_v1_1/models/smplx/SMPLX_NEUTRAL.npz --vposer_ckpt ../../data/smplify-x/vposer_v1_0 --part_segm_fn ../../data/smplify-x/smplx_parts_segm.pkl

Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 0.8836
Camera initialization final loss 7888.6914
Stage 000 done after 2.7946 seconds
Stage 001 done after 2.0227 seconds
Stage 002 done after 2.7113 seconds
Stage 003 done after 13.2658 seconds
Stage 004 done after 25.5132 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:46<00:00, 9.26s/it]
Body fitting Orientation 0 done after 46.3134 seconds
Body final loss val = 10309.36621
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:46<00:00, 46.31s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 1.0365
Camera initialization final loss 6034.3271
Stage 000 done after 3.4631 seconds
Stage 001 done after 0.9172 seconds
Stage 002 done after 1.3371 seconds
Stage 003 done after 11.0011 seconds
Stage 004 done after 15.4012 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:32<00:00, 6.43s/it]
Body fitting Orientation 0 done after 32.1260 seconds
Body final loss val = 3179.14966
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:32<00:00, 32.13s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 1.4423
Camera initialization final loss 242.5790
Stage 000 done after 2.3764 seconds
Stage 001 done after 1.1786 seconds
Stage 002 done after 2.9422 seconds
Stage 003 done after 8.2292 seconds
Stage 004 done after 10.8499 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00, 5.12s/it]
Body fitting Orientation 0 done after 25.5825 seconds
Body final loss val = 4567.84961
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:25<00:00, 25.58s/it]
Processing the data took: 00 hours, 01 minutes, 51 seconds
If you check the first log, the image that generates the NaN error is HICO_test2015_00000471.jpg. The other images (e.g., HICO_test2015_00000470.jpg) run without problems even when you have all the images in the same folder. If HICO_test2015_00000471.jpg is the only "bad" image, you could remove it from the folder and try again without it.
@geopavlakos thanks a lot for your response.
For that specific image, I got this error on local machine: https://pastebin.com/raw/ezfy7Jwu
[2440:2429 0:2108] 10:16:17 Wed Jan 13 [mona@goku:pts/0 +1] ~/research/code/smplify-x
$ ./bad_image_fit4.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER4/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 1.1937
Camera initialization final loss 395.8522
Stage 000 done after 2.0787 seconds
Stage 001 done after 0.9554 seconds
Stage 002 done after 2.9617 seconds
Stage: 60%|██████████████████████████████████████████████▊ | 3/5 [00:06<00:04, 2.03s/it]
Orientation: 0%| | 0/2 [00:06<?, ?it/s]
Traceback (most recent call last):
File "smplifyx/main.py", line 272, in <module>
main(**args)
File "smplifyx/main.py", line 245, in main
fit_single_frame(img, keypoints[[person_id]],
File "/home/mona/research/code/smplify-x/smplifyx/fit_single_frame.py", line 439, in fit_single_frame
final_loss_val = monitor.run_fitting(
File "/home/mona/research/code/smplify-x/smplifyx/fitting.py", line 175, in run_fitting
loss = optimizer.step(closure)
File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 393, in step
loss, flat_grad, t, ls_func_evals = _strong_Wolfe(obj_func, x_init, t, d,
File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 46, in _strong_Wolfe
f_new, g_new = obj_func(x, t, d)
File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 392, in obj_func
return self._directional_evaluate(closure, x, t, d)
File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 251, in _directional_evaluate
loss = float(closure())
File "/home/mona/research/code/smplify-x/smplifyx/fitting.py", line 246, in fitting_func
total_loss = loss(body_model_output, camera=camera,
File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mona/research/code/smplify-x/smplifyx/fitting.py", line 434, in forward
collision_idxs = self.search_tree(triangles)
File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/mesh_intersection/bvh_search_tree.py", line 56, in forward
return BVHFunction.apply(triangles)
File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/mesh_intersection/bvh_search_tree.py", line 38, in forward
outputs = bvh_cuda.forward(triangles,
MemoryError: std::bad_alloc: cudaErrorMemoryAllocation: out of memory
20711/31772MB(smplifyx)
and I got no error when I ran it on server with 12G GPU memory.
(smplifyx) mona@ubuntu:~/mona/code/smplify-x$ ./bad_image_fit4.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER4/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
~/mona/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 1.7885
Camera initialization final loss 395.8533
Stage 000 done after 3.6366 seconds
Stage 001 done after 1.6833 seconds
Stage 002 done after 6.2464 seconds
Stage 003 done after 6.9281 seconds
Stage 004 done after 12.8438 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:31<00:00, 6.27s/it]
Body fitting Orientation 0 done after 31.3518 seconds
Body final loss val = 1509.08716
Orientation: 50%|████████████████████████████████████ | 1/2 [00:31<00:31, 31.35s/it]/home/mona/venv/smplifyx/lib/python3.6/site-packages/smplx/body_models.py:270: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
param[:] = torch.tensor(params_dict[param_name])
Stage 000 done after 4.3290 seconds
Stage 001 done after 1.2125 seconds
Stage 002 done after 5.2780 seconds
Stage 003 done after 7.2431 seconds
Stage 004 done after 5.2169 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00, 4.66s/it]
Body fitting Orientation 1 done after 23.2944 seconds
Body final loss val = 1509.74951
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:54<00:00, 27.32s/it]
Processing the data took: 00 hours, 01 minutes, 02 seconds
So if you look at this other image which I ended up with NAN in 12G GPU server, https://github.com/DirtyHarryLYL/DJ-RN/issues/28#issuecomment-759194536 when I run it on local machine I get no error (when I run in isolation).
Also, when I run the same exact image which throw an NAN error when ran along with a bunch of images on server, again in isolation in server, I get no error. I am baffled as to why running it along with other images is causing this problem. Could you please walk me through this?
(smplifyx) mona@ubuntu:~/mona/code/smplify-x$ ./bad_image_fit.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER/images/HICO_test2015_00001357.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
~/mona/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 2.0274
Camera initialization final loss 210.5556
Stage 000 done after 5.9732 seconds
Stage 001 done after 0.7499 seconds
Stage 002 done after 4.9734 seconds
Stage 003 done after 7.4580 seconds
Stage 004 done after 17.3789 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:36<00:00, 7.31s/it]
Body fitting Orientation 0 done after 36.5480 seconds
Body final loss val = 1884.72766
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:36<00:00, 36.55s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 1.4444
Camera initialization final loss 179.2693
Stage 000 done after 3.3054 seconds
Stage 001 done after 1.9925 seconds
Stage 002 done after 5.2941 seconds
Stage 003 done after 7.1860 seconds
Stage 004 done after 3.8547 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:21<00:00, 4.33s/it]
Body fitting Orientation 0 done after 21.6465 seconds
Body final loss val = 708.07715
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:21<00:00, 21.65s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 2.0496
Camera initialization final loss 176.3299
Stage 000 done after 3.2555 seconds
Stage 001 done after 3.2203 seconds
Stage 002 done after 4.7340 seconds
Stage 003 done after 5.5548 seconds
Stage 004 done after 2.5345 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:19<00:00, 3.86s/it]
Body fitting Orientation 0 done after 19.3133 seconds
Body final loss val = 350.72900
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:19<00:00, 19.31s/it]
Processing the data took: 00 hours, 01 minutes, 29 seconds
(smplifyx) mona@ubuntu:~/mona/data/smplify-x/BAD_DATA_FOLDER$ ls images/
total 168K
drwxrwxr-x 1 mona mona 30 Jan 13 19:31 ..
-rwxrwxr-x 1 mona mona 165K Jan 13 19:31 HICO_test2015_00001357.jpg
drwxrwxr-x 1 mona mona 52 Jan 13 19:31 .
and
(smplifyx) mona@ubuntu:~/mona/data/smplify-x/BAD_DATA_FOLDER4$ ls images/
total 76K
drwxrwxr-x 1 mona mona 30 Jan 13 19:22 ..
-rwxrwxr-x 1 mona mona 75K Jan 13 19:22 HICO_test2015_00000471.jpg
drwxrwxr-x 1 mona mona 52 Jan 13 19:22 .
Here's image HICO_test2015_00000471.jpg
Please note that I ran the problematic image once again in my local machine and it threw no error. This is very inconsistent. So do you have methods that would take care of these when I am trying to run it on like 30K images?
$ ./bad_image_fit4.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER4/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 1.2542
Camera initialization final loss 395.8522
Stage 000 done after 2.2228 seconds
Stage 001 done after 0.9547 seconds
Stage 002 done after 3.0120 seconds
Stage 003 done after 7.5243 seconds
Stage 004 done after 9.8090 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00, 4.71s/it]
Body fitting Orientation 0 done after 23.5286 seconds
Body final loss val = 1509.15723
Orientation: 50%|████████████████████████████████████ | 1/2 [00:23<00:23, 23.53s/it]/home/mona/venv/smplifyx/lib/python3.8/site-packages/smplx/body_models.py:270: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
param[:] = torch.tensor(params_dict[param_name])
Stage 000 done after 2.2364 seconds
Stage 001 done after 1.2320 seconds
Stage 002 done after 4.3765 seconds
Stage 003 done after 8.0905 seconds
Stage 004 done after 9.9109 seconds
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00, 5.17s/it]
Body fitting Orientation 1 done after 25.8622 seconds
Body final loss val = 1509.17957
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:49<00:00, 24.70s/it]
Processing the data took: 00 hours, 00 minutes, 54 seconds
Here's image HICO_test2015_00001357.jpg
Here are its meshes produced on the server (also was able to do so on my local machine) in isolation:
here's the resulting mesh
overall, then, my question remains how can I run SMPLify-X on a large amount of images given this degree of inconsistency in results without getting bugged by NAN loss results?
@monacv The NaN result is mostly due to a bad Openpose prior. You should run Openpose first on these images with hand and face coordinates. If those are missing, Openpose won't give a complete 118 joint output for the full body and SMPL-X would fail.
Hello,
First, thanks a Iot for your work, this is an impressive tool.
I do have a similar problem, I run smplify-x on frames extracted from a video, and I get a NaN loss value, stopping
error seemingly randomly.
I run smplify-x on CPU as it is much faster than GPU, as mentioned in https://github.com/vchoutas/smplify-x/issues/163.
The 3D meshes files end up with nan values for every line starting with v
(the 6980 first lines of the file), and the fields ["Camera translation", "betas", "global_orient", "body_pose", "joints", "vertices"]
are all NaNs on the pkl files.
It affects every single frame processed after the first one that gets this error, which is surprising since, as far as I'm aware, smplify should process frames indepently from each other.
Note: I had previously made local changes to not interrupt the processing for nan loss errors because of rare cases of openpose failure to detect a person on the image, so my smplify does not exit when the error is raised but continues to the next frame instead. I guess I'll have to change that, but thanks to this I know that the NaN loss value, stopping
never propagated to the next frames when it occured due to bad openpose missing data before. [Edit] I realised my local changes interrupted the loop before it got to the point of calculating the camera loss, which might explain why it could not propagate the NaN values.
For one video, the error might pop up at frame ~400, while for another, at frame ~5000, etc. For others, there is no error. I did not notice any difference between the last properly processed frame and the first one that throws the error, apart from small movements of the individual. No major difference either on the openpose data (to address the previous comment on the thread).
There is always only one person on the image, so it is a different situation from https://github.com/vchoutas/smplify-x/issues/142.
Furthermore, If I do, as mentioned in an above message here, move the data to another folder then re-start the process from the frame that started getting NaN values, smplify can process it just fine, though it can fail again and throw another NaN loss value, stopping
error at any time further down the frames.
I previously ran simplify-x on up to 29000 frames (on average ~3000), without errors. Using the same data sets, I started to get these errors when I changed the focal length from 5000 to 400. The reason for the change is that there is a high variability in camera optimisation settings between two subsequent frames (up to ~4 meters jumps for the estimated camera position). We reduced the focal to correspond to a more realistic value, which led to better stability of the estimated camera position on smaller image sets (~1000) we used for testing the change, without any NaN error occuring
It is weird that this would be the cause of the problem, since smplify runs properly with the same settings and input data if it is called again on the imaged that failed to be processed. I am currently running some sets with a focal of 1000 to compare just in case. I'll also test if the errors happen at the same frames every time when processing the same image set.
[Edited on 09/03/2022] Changing the focus to 1000 instead of 400 indeed removed the occurence of the NaN loss value for some videos, and not for others. The NaN loss value happens even for videos where the modified focus value seems to be adequate and to create a lot more accurate estimations. I guess the alternatives for me right now are either increasing the focus to avoid interruptions, or interrupt and restart smplify-x's processing at the frame that starts getting these NaN values.