[Help]: Vevo 1.5 training without emilia dataset
Problem Overview
Thank you for the great work of Vevo 1.5!
Q1: Is that if the emilia dataset is not available, the training will automatically skip it without an error? (I'm wondering - I've tried not using emilia, while keeping all the config unchanged, but got no error.)
Q2: I wanted to train the auto-regressive transformer with only singing dataset (without emilia dataset), what changes should I make in ar_synthesis.json?
Currently, I change: "use_emilia_dataset" to false and "emilia" in "dataset" to 0 "dataset": { "emilia": 0, // Originally 1. 101k hours, 34m samples "singnet": 20 // 400 hours, 0.34m samples * 20 = 6.8m samples }, But I got an error: File "/export/fs05/zsu21/svc/Amphion/models/base/base_trainer.py", line 404, in _build_dataloader train_dataset = Dataset(self.cfg, self.cfg.dataset[0], is_valid=False) File "/export/fs05/zsu21/svc/Amphion/utils/util.py", line 498, in getitem return getattr(self, key) TypeError: getattr(): attribute name must be string
Thank you again for your great work!
In my case, just set "emilia" in "dataset" to 0 is enough. "use_emilia_dataset " is set to true. When "emilia" in "dataset" is set to 0, it will only load your custom dataset.
In my case, just set "emilia" in "dataset" to 0 is enough. "use_emilia_dataset " is set to true. When "emilia" in "dataset" is set to 0, it will only load your custom dataset.
Thanks! BTW, may I ask when you do want to use emilia, what would you do besides changing the 2 lines in emilia_dataset.py: MNT_PATH = "[Please fill out your emilia data root path]" CACHE_PATH = "[Please fill out your emilia cache path]"
Thanks again for your reply!
For my case, I did some more preprocessing step on the data in emilia dataset. So i got the numpy array from huggingface, ran the preprocessing steps, and save the result as flac format on my local disk. I then use them like custom data with the same configuration mentioned above
Thanks for the answer! I still wonder can I use the config in emilia_dataset.py? After I downloaded the emilia dataset, I saved them in a folder named "emilia", and it contains a bunch of .arrow files (e.g., data-00000-of-00010.arrow), and 2 .json files (dataset_info.json, state.json). I wonder how should I modify: MNT_PATH = "[Please fill out your emilia data root path]" CACHE_PATH = "[Please fill out your emilia cache path]" to make it use emilia dataset in training. Or do I need to do further process on the downloaded emilia dataset?