David Gimeno-Gómez

Results 11 comments of David Gimeno-Gómez

It is true, I forgot to include the error trace!! Please find here an example of the raised error when setting ```maxlenratio``` to 1.2: ``` Traceback (most recent call last):...

Thanks for sharing that tutorial, it helped me understand how to set the ```maxlenratio``` and ```minlenratio``` in a more proper way depending on whether the hypothesis I am getting suffered...

@Developer1881 According to my intuition, at some moment of the model forward the cropped face is embedded in a 64x64 latent representation, and then a heatmap is predicted for each...

Although it can be a bit late, that 104 refers to the number of features composing the audio input tensor. In order to make the model work with audio stream...

Diving into my code scripts, I found that I did this padding trick: ``` diff = len(audio_feats) - len(video_frames) if diff < 0: audio_feats = np.concatenate([ audio_feats, np.zeros([-diff, audio_feats.shape[-1]], dtype=audio_feats.dtype),...

I am glad it worked :) However, I've never tried to extract features from a specific intermediate layer of AV-HuBERT, so in this case I cannot help you. I guess...

In the field of ASR, the audio raw waveform is processed to extract the well-established Mel Frequency Cepstral Coefficients (MFCCs). I recommend you to read more about how these audio...

Thank so much for your reply, it solved the problem! Nonetheless, I have a new doubt :) · First of all, I will explain you my purpose. I am working...

Before exploring the AV-HuBERTsystem with my own database, I wanted to see if I can reach similar performance with the LRS3 database. The point is that I had already prepared...

Thank you for clarifying these aspects!! Now, i would like to ask you how I can fine-tune this pre-trained model. I run this command: `fairseq-hydra-train --config-dir ${PWD}/conf/finetune/ --config-name base_vox_433h.yaml task.data=${PWD}/data/LRS3-TED/speaker-independent/...