IndexError: index 676 is out of bounds for dimension 0 with size 676
| load 'model' from 'checkpoints/audio2motion_vae/model_ckpt_steps_400000.ckpt', strict=True
| WARN: checkpoints/motion2video_nerf/may_torso/lm3d_radnerf_torso.yaml not exist.
| load 'model' from 'checkpoints/motion2video_nerf/may_torso/model_ckpt_steps_250000.ckpt', strict=True
trainval: Smooth head trajectory (rotation and translation) with a window size of 7
/data/zssy-digital-human/projects/gpp/tasks/radnerfs/dataset_utils.py:263: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.lm68s = torch.tensor(self.lm2ds[:, index_lm68_from_lm478, :])
Extracted wav file (16khz) from data/raw/val_wavs/8-27s.wav to data/raw/val_wavs/8-27s_16k.wav.
Loading the HuBERT Model...
/data/home/yaokj5/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading the Wav2Vec2 Processor...
Traceback (most recent call last):
File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 542, in
It seems like a error caused by index out of bounds. Can you provide more details? Since the code should have convert the audio to 16k and video to 25 fps.
@lokvke, could you please attempt it using an audio file longer than 10 seconds? In my testing, it consistently fails when the provided audio is less than 8 seconds.
Hey @yerfor. I tried running with longer audio clips as well. For the same audio clip, I tried the full length (around 1min 30s) and a 59s segment, both failed with a similar error, just the index value mention was different (but the same between multiple runs). It seems like it worked for a sample that was around 40s long. All samples were encoded to 16kHz successfully and as far as I can tell, the error seems to happen in the exact same line. Is there any other detail I can provide for this to help debug this issue ?
Hi, I have the same problem. Always IndexError appears and same error with different lengths of drive audio. How can I solve this problem?
in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution:
idx = i % (i + j)
(ps: the blinking result seems not very natural)
in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution:
idx = i % (i + j)
(ps: the blinking result seems not very natural)
- Hi, thanks for your comment. I will update the mentioned modification in the latest commit.
- As for the blinking results, the blink motion is controlled by the hard-coded
blink_factor_lst = np.array([0.1, 0.5, 0.7, 1.0, 0.7, 0.5, 0.1]) # * 0.9in theinject_blink_to_lm68 function. Maybe you can try different values to improve the naturalness of eye blink.
I have the same problem
(geneface) hawk@R740:~/GeneFacePlusPlus$ python inference/genefacepp_infer.py --a2m_ckpt=checkpoints/audio2motion_vae --head_ckpt= --torso_ckpt=checkpoints/motion2video_nerf/lxl_torso --drv_aud=data/raw/val_wavs/ioslow.wav --out_name=lxl_demo.mp4 --low_memory_usage
| WARN: egs/egs_bases/audio2motion/vae.yaml not exist.
| WARN: checkpoints/th1kh_512_audio2motion/base.yaml not exist.
| Hparams: {
"accumulate_grad_batches": 1,
"amp": false,
"audio_type": "hubert",
"base_config": [
"egs/egs_bases/audio2motion/vae.yaml",
"../th1kh_512_audio2motion/base.yaml"
],
"batch_size": 4,
"binarization_args": {
"with_coeff": true,
"with_hubert": true,
"with_mel": true
},
"binary_data_dir": "data/binary/voxceleb2_audio2motion",
"blink_mode": "blink_unit",
"clip_grad_norm": 1,
"clip_grad_value": 0,
"debug": false,
"ds_name": "TH1KH_512",
"eval_max_batches": 10,
"exp_name": "",
"gen_dir_name": "",
"hidden_size": 256,
"infer": false,
"infer_audio_source_name": "",
"infer_ckpt_steps": 40000,
"infer_out_npy_name": "",
"init_from_ckpt": "",
"init_method": "tcp",
"lambda_kl": 0.02,
"lambda_kl_t1": 2000,
"lambda_kl_t2": 2000,
"lambda_l2_reg_exp": 0.1,
"lambda_mse_exp": 1.0,
"lambda_mse_lm2d": 0.0,
"lambda_mse_lm3d": 0.0,
"load_ckpt": "",
"load_db_to_memory": false,
"lr": 0.0005,
"max_sentences_per_batch": 512,
"max_tokens_per_batch": 20000,
"max_updates": 400000,
"motion_type": "exp",
"num_ckpt_keep": 100,
"num_sanity_val_steps": 5,
"num_valid_plots": 1,
"num_workers": 4,
"optimizer_adam_beta1": 0.9,
"optimizer_adam_beta2": 0.999,
"print_nan_grads": false,
"process_id": 0,
"raw_data_dir": "/home/tiger/datasets/raw/TH1KH_512",
"ref_id_mode": "first_frame",
"resume_from_checkpoint": 0,
"sample_min_length": 32,
"save_best": false,
"save_codes": [
"tasks",
"modules",
"egs"
],
"save_gt": true,
"scheduler": "exponential",
"seed": 9999,
"smo_win_size": 5,
"split_seed": 999,
"start_rank": 0,
"syncnet_ckpt_dir": "checkpoints/0904_syncnet/syncnet_hubert_vox2",
"task_cls": "tasks.os_avatar.audio2secc_task.Audio2SECCTask",
"tb_log_interval": 100,
"total_process": 1,
"use_eye_amp_embed": false,
"use_flow": true,
"use_fork": true,
"use_kv_dataset": true,
"use_mouth_amp_embed": true,
"use_pitch": true,
"val_check_interval": 2000,
"valid_infer_interval": 2000,
"valid_monitor_key": "val_loss",
"valid_monitor_mode": "min",
"validate": false,
"warmup_updates": 1000,
"weight_decay": 0,
"work_dir": "",
"world_size": -1
}
| load 'model' from 'checkpoints/audio2motion_vae/model_ckpt_steps_400000.ckpt', strict=True
| WARN: checkpoints/motion2video_nerf/lxl_torso/lm3d_radnerf_torso.yaml not exist.
| load 'model' from 'checkpoints/motion2video_nerf/lxl_torso/model_ckpt_steps_250000.ckpt', strict=True
trainval: Smooth head trajectory (rotation and translation) with a window size of 7
/home/hawk/GeneFacePlusPlus/tasks/radnerfs/dataset_utils.py:266: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.lm68s = torch.tensor(self.lm2ds[:, index_lm68_from_lm478, :])
/home/hawk/GeneFacePlusPlus/inference/genefacepp_infer.py:184: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
eye_area_percents = torch.tensor(self.dataset.eye_area_percents)
Extracted wav file (16khz) from data/raw/val_wavs/ioslow.wav to data/raw/val_wavs/ioslow_16k.wav.
Loading the HuBERT Model...
/home/hawk/miniconda3/envs/geneface/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
/home/hawk/miniconda3/envs/geneface/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading the Wav2Vec2 Processor...
Traceback (most recent call last):
File "/home/hawk/GeneFacePlusPlus/inference/genefacepp_infer.py", line 593, in
经过测试,发现引起这个问题的原因可能是:视频和音频的声道不匹配,推理视频是单声道,而推理用的音频是立体声。我的是改为匹配的声道就解决问题了。
in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution: idx = i % (i + j) (ps: the blinking result seems not very natural)
- Hi, thanks for your comment. I will update the mentioned modification in the latest commit.
- As for the blinking results, the blink motion is controlled by the hard-coded
blink_factor_lst = np.array([0.1, 0.5, 0.7, 1.0, 0.7, 0.5, 0.1]) # * 0.9in theinject_blink_to_lm68 function. Maybe you can try different values to improve the naturalness of eye blink.
is this issue fixed now??