Knut(Ke) Chen
Knut(Ke) Chen
Hi, No really, because HTS-AT itself is our proposed audio transformer, in this paper, we just use it for audio classification and SED tasks. But we use this HTS-AT architecture...
您好,输入的谱图大小是256 x 256,其实谱图是不需要转成模型的输入大小的,在原来的谱图大小是1024 * 64 上也是可以做一样的patch,但是由于我们想利用swin-transformer的pretrained model来提高性能,所以做了一个rearrange
Already Answered in another issue.
Hi, our pretrained checkpoint is released, please check our readme [released link](https://drive.google.com/drive/folders/1f5VYMk0uos_YnuBshgmaTVioXbs7Kmz6). For AudioSet, you can refer to [this repo](https://github.com/qiuqiangkong/audioset_tagging_cnn), we use their stored AudioSet (Please check the refered repo's...
Hi, sorry for the late reply. You can refer to [this](https://github.com/RetroCirce/HTS-Audio-Transformer/issues/21) issue. Basically the reason is about the environment and some hyperparameter changes. My environment (when I did this project)...
Hi, sorry for the late reply. You need to revise or refer to the data_processor.py file to change the dataset loader and dataset classes. I use the "LGSPDataset" module to...
8 V100 GPU for 1-2 days training can lead to the reported performance. If you use one GPU, perhaps also can achieve it by 5-7 days training.
Hi @lucidrains Currently we briefly scanned your code and it looks great to us. After you finish the code, just let us know. We will go mainly over the spec-augment...
Hi, For this project encoding method, it is not easy to add the velocity. But some following works using transformer architecture and advanced representations of music (in 2021, 2022 year)...
Hi, I once used the dataset_idx/dataloader_idx because I test multiple test sets/validation sets when training the model. I.e., after I train 1 epoch, I test each validation sets (namely idx...