Amphion icon indicating copy to clipboard operation
Amphion copied to clipboard

[Help]: Vevo 1.5 training without emilia dataset

Open mangoszteen opened this issue 6 months ago • 4 comments

Problem Overview

Thank you for the great work of Vevo 1.5!

Q1: Is that if the emilia dataset is not available, the training will automatically skip it without an error? (I'm wondering - I've tried not using emilia, while keeping all the config unchanged, but got no error.)

Q2: I wanted to train the auto-regressive transformer with only singing dataset (without emilia dataset), what changes should I make in ar_synthesis.json?

Currently, I change: "use_emilia_dataset" to false and "emilia" in "dataset" to 0 "dataset": { "emilia": 0, // Originally 1. 101k hours, 34m samples "singnet": 20 // 400 hours, 0.34m samples * 20 = 6.8m samples }, But I got an error: File "/export/fs05/zsu21/svc/Amphion/models/base/base_trainer.py", line 404, in _build_dataloader train_dataset = Dataset(self.cfg, self.cfg.dataset[0], is_valid=False) File "/export/fs05/zsu21/svc/Amphion/utils/util.py", line 498, in getitem return getattr(self, key) TypeError: getattr(): attribute name must be string

Thank you again for your great work!

mangoszteen avatar Jul 15 '25 03:07 mangoszteen

In my case, just set "emilia" in "dataset" to 0 is enough. "use_emilia_dataset " is set to true. When "emilia" in "dataset" is set to 0, it will only load your custom dataset.

josephwong14wkh avatar Jul 16 '25 04:07 josephwong14wkh

In my case, just set "emilia" in "dataset" to 0 is enough. "use_emilia_dataset " is set to true. When "emilia" in "dataset" is set to 0, it will only load your custom dataset.

Thanks! BTW, may I ask when you do want to use emilia, what would you do besides changing the 2 lines in emilia_dataset.py: MNT_PATH = "[Please fill out your emilia data root path]" CACHE_PATH = "[Please fill out your emilia cache path]"

Thanks again for your reply!

mangoszteen avatar Jul 16 '25 11:07 mangoszteen

For my case, I did some more preprocessing step on the data in emilia dataset. So i got the numpy array from huggingface, ran the preprocessing steps, and save the result as flac format on my local disk. I then use them like custom data with the same configuration mentioned above

josephwong14wkh avatar Jul 18 '25 14:07 josephwong14wkh

Thanks for the answer! I still wonder can I use the config in emilia_dataset.py? After I downloaded the emilia dataset, I saved them in a folder named "emilia", and it contains a bunch of .arrow files (e.g., data-00000-of-00010.arrow), and 2 .json files (dataset_info.json, state.json). I wonder how should I modify: MNT_PATH = "[Please fill out your emilia data root path]" CACHE_PATH = "[Please fill out your emilia cache path]" to make it use emilia dataset in training. Or do I need to do further process on the downloaded emilia dataset?

mangoszteen avatar Aug 01 '25 17:08 mangoszteen