HTS-Audio-Transformer
HTS-Audio-Transformer copied to clipboard
Does this framework's output have been compared with other features?
Does this framework's output have been compared with other features like wav2vec, hubert?
Hi,
No really, because HTS-AT itself is our proposed audio transformer, in this paper, we just use it for audio classification and SED tasks. But we use this HTS-AT architecture in other tasks, such as contrastive language-audio pretraining, CLAP. We compare this audio representation with other TF-domain SoTA. I think wav2vec can be compared, even though we did not conduct such experimenes before.