Gait3D-Benchmark
Gait3D-Benchmark copied to clipboard
lib/modeling/models/smplgait.py throwing error when training a new dataset
Hi Jinkai,
When I try to use the SMPLGait to apply on other dataset, during the training process, the smplgait.py throws the error that: smpls = ipts[1][0] # [n, s, d] IndexError: list index out of range It is also interesting that I used 4 GPUs in the training. 3 of them could detect the the ipts[1][0] tensor with size 1. However, the fourth one failed to do so. Could I know how I can solve this?
Hi~ Because the framework is based on DDP mode, it is recommended that you use only 1 GPU for debugging. This will make it clear for you to examine your problem.
Could I know how to modify the code to running with 1 GPU?
Just like this, change the value of CUDA_VISIBLE_DEVICES and --nproc_per_node:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 lib/main.py --cfgs ./config/smplgait_64pixel.yaml --phase train
Thank you! I have tried and the same error appeared. Do you have any guess on why the smpls could not retrieve the tensor information from ipts? Also, I keep meeting the error of :
"/home/zhiyuann/Gait3D-Benchmark/lib/modeling/base_model.py:338: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. for smpl in smpls_batch]"
Do you think that may contribute to the error?
I recommend you start at the source and go step by step to make sure what is causing the missing of smpl data.
Hi Jinkai,
I retrace the error and find that it happens in the base_model.py when running till the pretreating the smpls with the code smpls = [np2var(np.asarray([fra for fra in smpl]), requires_grad=requires_grad).float() for smpl in smpls_batch]
It throws the error that TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
It will throw errors even if I change the dtype to float16 as the trainer_cfg indicates. Do you know what may contribute to that?
The "enable_float16" in trainer_cfg aims to memory reduction and speed up.
Maybe you can try:
smpls = [np2var(np.asarray([fra for fra in smpl]).astype(float), requires_grad=requires_grad).float() for smpl in smpls_batch]