GeneFacePlusPlus icon indicating copy to clipboard operation
GeneFacePlusPlus copied to clipboard

IndexError: index 676 is out of bounds for dimension 0 with size 676

Open lokvke opened this issue 1 year ago • 7 comments

| load 'model' from 'checkpoints/audio2motion_vae/model_ckpt_steps_400000.ckpt', strict=True | WARN: checkpoints/motion2video_nerf/may_torso/lm3d_radnerf_torso.yaml not exist. | load 'model' from 'checkpoints/motion2video_nerf/may_torso/model_ckpt_steps_250000.ckpt', strict=True trainval: Smooth head trajectory (rotation and translation) with a window size of 7 /data/zssy-digital-human/projects/gpp/tasks/radnerfs/dataset_utils.py:263: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.lm68s = torch.tensor(self.lm2ds[:, index_lm68_from_lm478, :]) Extracted wav file (16khz) from data/raw/val_wavs/8-27s.wav to data/raw/val_wavs/8-27s_16k.wav. Loading the HuBERT Model... /data/home/yaokj5/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Loading the Wav2Vec2 Processor... Traceback (most recent call last): File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 542, in GeneFace2Infer.example_run(inp) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 490, in example_run infer_instance.infer_once(inp) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 180, in infer_once out_name = self.forward_system(samples, inp) File "/data/home/yaokj5/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 475, in forward_system self.forward_audio2secc(batch, inp) File "/data/home/yaokj5/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 384, in forward_audio2secc cano_lm3d = inject_blink_to_lm68(cano_lm3d) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 103, in inject_blink_to_lm68 lm68[idx, 36:48] = lm68[idx, 36:48] * (1-blink_factor) + closed_eye_lm68[idx, 36:48] * blink_factor IndexError: index 676 is out of bounds for dimension 0 with size 676

lokvke avatar Feb 05 '24 00:02 lokvke

It seems like a error caused by index out of bounds. Can you provide more details? Since the code should have convert the audio to 16k and video to 25 fps.

yerfor avatar Feb 05 '24 04:02 yerfor

@lokvke, could you please attempt it using an audio file longer than 10 seconds? In my testing, it consistently fails when the provided audio is less than 8 seconds.

Ahmer-444 avatar Feb 06 '24 00:02 Ahmer-444

Hey @yerfor. I tried running with longer audio clips as well. For the same audio clip, I tried the full length (around 1min 30s) and a 59s segment, both failed with a similar error, just the index value mention was different (but the same between multiple runs). It seems like it worked for a sample that was around 40s long. All samples were encoded to 16kHz successfully and as far as I can tell, the error seems to happen in the exact same line. Is there any other detail I can provide for this to help debug this issue ?

AHarmlessPyro avatar Feb 09 '24 11:02 AHarmlessPyro

Hi, I have the same problem. Always IndexError appears and same error with different lengths of drive audio. How can I solve this problem?

benchrus avatar Feb 14 '24 01:02 benchrus

in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution:

idx = i % (i + j)

(ps: the blinking result seems not very natural)

lokvke avatar Feb 18 '24 02:02 lokvke

in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution:

idx = i % (i + j)

(ps: the blinking result seems not very natural)

  • Hi, thanks for your comment. I will update the mentioned modification in the latest commit.
  • As for the blinking results, the blink motion is controlled by the hard-coded blink_factor_lst = np.array([0.1, 0.5, 0.7, 1.0, 0.7, 0.5, 0.1]) # * 0.9 in the inject_blink_to_lm68 function. Maybe you can try different values to improve the naturalness of eye blink.

yerfor avatar Feb 18 '24 07:02 yerfor

I have the same problem

(geneface) hawk@R740:~/GeneFacePlusPlus$ python inference/genefacepp_infer.py --a2m_ckpt=checkpoints/audio2motion_vae --head_ckpt= --torso_ckpt=checkpoints/motion2video_nerf/lxl_torso --drv_aud=data/raw/val_wavs/ioslow.wav --out_name=lxl_demo.mp4 --low_memory_usage | WARN: egs/egs_bases/audio2motion/vae.yaml not exist. | WARN: checkpoints/th1kh_512_audio2motion/base.yaml not exist. | Hparams: { "accumulate_grad_batches": 1, "amp": false, "audio_type": "hubert", "base_config": [ "egs/egs_bases/audio2motion/vae.yaml", "../th1kh_512_audio2motion/base.yaml" ], "batch_size": 4, "binarization_args": { "with_coeff": true, "with_hubert": true, "with_mel": true }, "binary_data_dir": "data/binary/voxceleb2_audio2motion", "blink_mode": "blink_unit", "clip_grad_norm": 1, "clip_grad_value": 0, "debug": false, "ds_name": "TH1KH_512", "eval_max_batches": 10, "exp_name": "", "gen_dir_name": "", "hidden_size": 256, "infer": false, "infer_audio_source_name": "", "infer_ckpt_steps": 40000, "infer_out_npy_name": "", "init_from_ckpt": "", "init_method": "tcp", "lambda_kl": 0.02, "lambda_kl_t1": 2000, "lambda_kl_t2": 2000, "lambda_l2_reg_exp": 0.1, "lambda_mse_exp": 1.0, "lambda_mse_lm2d": 0.0, "lambda_mse_lm3d": 0.0, "load_ckpt": "", "load_db_to_memory": false, "lr": 0.0005, "max_sentences_per_batch": 512, "max_tokens_per_batch": 20000, "max_updates": 400000, "motion_type": "exp", "num_ckpt_keep": 100, "num_sanity_val_steps": 5, "num_valid_plots": 1, "num_workers": 4, "optimizer_adam_beta1": 0.9, "optimizer_adam_beta2": 0.999, "print_nan_grads": false, "process_id": 0, "raw_data_dir": "/home/tiger/datasets/raw/TH1KH_512", "ref_id_mode": "first_frame", "resume_from_checkpoint": 0, "sample_min_length": 32, "save_best": false, "save_codes": [ "tasks", "modules", "egs" ], "save_gt": true, "scheduler": "exponential", "seed": 9999, "smo_win_size": 5, "split_seed": 999, "start_rank": 0, "syncnet_ckpt_dir": "checkpoints/0904_syncnet/syncnet_hubert_vox2", "task_cls": "tasks.os_avatar.audio2secc_task.Audio2SECCTask", "tb_log_interval": 100, "total_process": 1, "use_eye_amp_embed": false, "use_flow": true, "use_fork": true, "use_kv_dataset": true, "use_mouth_amp_embed": true, "use_pitch": true, "val_check_interval": 2000, "valid_infer_interval": 2000, "valid_monitor_key": "val_loss", "valid_monitor_mode": "min", "validate": false, "warmup_updates": 1000, "weight_decay": 0, "work_dir": "", "world_size": -1 } | load 'model' from 'checkpoints/audio2motion_vae/model_ckpt_steps_400000.ckpt', strict=True | WARN: checkpoints/motion2video_nerf/lxl_torso/lm3d_radnerf_torso.yaml not exist. | load 'model' from 'checkpoints/motion2video_nerf/lxl_torso/model_ckpt_steps_250000.ckpt', strict=True trainval: Smooth head trajectory (rotation and translation) with a window size of 7 /home/hawk/GeneFacePlusPlus/tasks/radnerfs/dataset_utils.py:266: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.lm68s = torch.tensor(self.lm2ds[:, index_lm68_from_lm478, :]) /home/hawk/GeneFacePlusPlus/inference/genefacepp_infer.py:184: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). eye_area_percents = torch.tensor(self.dataset.eye_area_percents) Extracted wav file (16khz) from data/raw/val_wavs/ioslow.wav to data/raw/val_wavs/ioslow_16k.wav. Loading the HuBERT Model... /home/hawk/miniconda3/envs/geneface/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( /home/hawk/miniconda3/envs/geneface/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Loading the Wav2Vec2 Processor... Traceback (most recent call last): File "/home/hawk/GeneFacePlusPlus/inference/genefacepp_infer.py", line 593, in GeneFace2Infer.example_run(inp) File "/home/hawk/GeneFacePlusPlus/inference/genefacepp_infer.py", line 537, in example_run infer_instance.infer_once(inp) File "/home/hawk/GeneFacePlusPlus/inference/genefacepp_infer.py", line 195, in infer_once samples = self.prepare_batch_from_inp(inp) File "/home/hawk/GeneFacePlusPlus/inference/genefacepp_infer.py", line 257, in prepare_batch_from_inp ngp_pose = self.dataset.poses[i].unsqueeze(0) IndexError: index 974 is out of bounds for dimension 0 with size 974

ghost avatar May 15 '24 00:05 ghost

经过测试,发现引起这个问题的原因可能是:视频和音频的声道不匹配,推理视频是单声道,而推理用的音频是立体声。我的是改为匹配的声道就解决问题了。

ghost avatar May 23 '24 09:05 ghost

in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution: idx = i % (i + j) (ps: the blinking result seems not very natural)

  • Hi, thanks for your comment. I will update the mentioned modification in the latest commit.
  • As for the blinking results, the blink motion is controlled by the hard-coded blink_factor_lst = np.array([0.1, 0.5, 0.7, 1.0, 0.7, 0.5, 0.1]) # * 0.9 in the inject_blink_to_lm68 function. Maybe you can try different values to improve the naturalness of eye blink.

is this issue fixed now??

MiaoJiawei97 avatar Jun 17 '24 02:06 MiaoJiawei97