Yuancheng0625
Yuancheng0625
Hi, you can download the vocoder checkpoint from https://huggingface.co/amphion/text_to_audio/tree/main/tta/hifigan_checkpoints
If the model is saved as model.safetensors (which means it is not pytorch_model.bin), please use "pip install accelerate==0.24.1"
Hi, we updated a PR to fix the problem. You can check it! (we use: from diffusers.optimization import get_cosine_schedule_with_warmup)
Hi, we haven't test NoamScheduler, I think using AdamW with lr between 5e-5 to 1e-4 and cosine schedule with warmup steps between 5K to 1W steps will give a more...
Sure, We will provide the script recently, and we will provide the processed data for AudioCaps (or more), you can directly download it.
> Hi, can you give us some direction of how to use pre-trained models in the TTA or TTM recipe. How do we include these pre-trained make-an-audio (https://drive.google.com/drive/folders/1zZTI3-nHrUIywKFqwxlFO6PjB66JA8jI) or this...
We will release AudioCaps dataset in huggingface in one week!
> Thank you for your great work! Could you please provide more information about the data format? I am trying to encode a 10s 24000fps wav file into embedding space...
We update processed AudioCaps dataset: https://openxlab.org.cn/datasets/Amphion/AudioCaps
Hi, since we use the official HifiGAN repo for training the vocoder for TTA, you can reference https://github.com/jik876/hifi-gan to convert waveform to melspectrogram.