End-to-End-VAD icon indicating copy to clipboard operation
End-to-End-VAD copied to clipboard

Difference in published and generated results

Open KumudTripathi opened this issue 8 months ago • 0 comments

Hello Team,

Thanks for providing the repo.

I have replicated this repo step by step as per the details mentioned in the paper and in this repo. First I have trained both streams separately and then used their pretrained weights to train multimodal architecture.

From the experiments, I can see that there is mismatch in the generated result (Accuracy ~82%) and the published result (Accuracy ~91%). Can I get the guidance from the team to achieve the same results?

Thanks in advance.

KumudTripathi avatar May 31 '24 07:05 KumudTripathi