Cross-Modal-BERT
Cross-Modal-BERT copied to clipboard
audio data
hello,I used WAV2VEC2 to extract audio features,The dimension of each Auido is(50,512),I changed the input to conv1d to 512,But the accuracy rate is always zero.Do you have any suggestions?
Can U tell more about Ur approach ?