lokvke comments

Results 10 comments of


                                            lokvke

how can i get ROI from nvcv.ImageBatchVarShape

> Hi @lokvke, Thank you very much for your interest in CVCUDA! > > Could you share some information on your use-case? Eg. would you like to get a ROI...

how can i get ROI from nvcv.ImageBatchVarShape

> Hi @lokvke, > > then you could you the padandstack operator, which gets as an input an ImageBatchVarSahpe and outputs a Tensor of the cropped size. Below is a...

[QUESTION] torch.as_tensor(nvcv.Tensor.cuda()) copy the data from gpu to cpu?

> Another question, can I convert ImageBatchVatShape to nvcv.Tensor or torch.Tensor directly? As far as I know, I have to process it one by one. +1

用Chinese Mandarin Lip Reading（CMLR）数据集训练syncnet模型，训练40000步。最终的syncloss停留在0.34，感觉这个loss有点高，对比了下lrs3的训练，在15000步的时候，syncloss 就能到0.25左右。

@yulj21 请问作者提供的May预训练模型是不是不支持对中文音频的合成呀？

IndexError: index 676 is out of bounds for dimension 0 with size 676

in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution: **idx = i % (i +...

size mismatch for blink_encoder.1.weight

eye_blink_dim: 2 in lm3d_radnerf_sr.yaml eye_blink_dim: 4 in lm3d_radnerf_torso_sr.yaml

only render with the head model, torso part shows

After 250000 steps' training, the torso part still be rendered only with the head model. This didn't happen in geneface project.

[BUG]字数较少时，生成的音频内容存在问题

短句合成音频混乱，请问解决了吗？

我参考作者的实现方式自己实现了GaussianTalker接入进来，为什么asyncio.Queue会一直阻塞住呢？各位懂异步编程的可以帮帮我吗

GaussianTalker速度和效果怎么样

internVL2-26B出现“复读机”情况

是用internvl2-26b提取图片中的类别名称时，偶尔也会出现复读的情况，如图所示： ![image](https://github.com/user-attachments/assets/c6fdd3f5-b768-40e2-86bb-a79cd7d578a6)