Amphion icon indicating copy to clipboard operation
Amphion copied to clipboard

Add FreeVC implementation

Open Nugine opened this issue 9 months ago • 4 comments

✨ Description

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

  • https://arxiv.org/abs/2210.15418
  • https://github.com/OlaWod/FreeVC

This PR is a part of AIR6063 final project.

  • @Nugine (223040051)
  • @SeanYouLaw (223040034)

FYI, we also have another repo which refactors the training pipeline. Both the PR code and the custom code can produce good checkpoints.

Here are our checkpoints trained with PR code on single NVIDIA RTX4090

  • 120000 steps, 183 epochs, 14.53 hours for each ckpt (freevc, freevc-s, freevc-nosr)
  • 290000 steps, 443 epochs, 35.98 hours for each ckpt (freevc, freevc-s, freevc-nosr)
  • 300000 steps, 458 epochs, 36.53 hours for each ckpt (freevc)

🚧 Related Issues

During the project, we have opened some issues and another PR to help improve Amphion.

  • #195
  • #196
  • #197
  • #198

👨‍💻 Changes Proposed

  • [x] Add FreeVC in models
  • [x] Add FreeVC in egs

🧑‍🤝‍🧑 Who Can Review?

[Please use the '@' symbol to mention any community member who is free to review the PR once the tests have passed. Feel free to tag members or contributors who might be interested in your PR.] @zhizhengwu @RMSnow @Adorable-Qin

✅ Checklist

  • [ ] Code has been reviewed
  • [ ] Code complies with the project's code standards and best practices
  • [ ] Code has passed all tests
  • [ ] Code does not affect the normal use of existing features
  • [ ] Code has been commented properly
  • [ ] Documentation has been updated (if applicable)
  • [ ] Demo/checkpoint has been attached (if applicable)

Nugine avatar May 07 '24 13:05 Nugine

Here are some examples of our results:

https://github.com/open-mmlab/Amphion/assets/58773169/3256156a-d77a-4079-a437-6b163a8d0c12

https://github.com/open-mmlab/Amphion/assets/58773169/02a1e70e-ec1b-4755-b4e5-9e50ebdb4620

https://github.com/open-mmlab/Amphion/assets/58773169/cac3e57b-782e-43f7-96d5-48420c113f23


https://github.com/open-mmlab/Amphion/assets/58773169/0bec78cd-1543-44dd-9673-9842dd96f7fa

https://github.com/open-mmlab/Amphion/assets/58773169/a7faf334-d863-4d11-beb5-4bd220055a7b

https://github.com/open-mmlab/Amphion/assets/58773169/055a3d9b-572f-4079-811a-f15becbc2c83


https://github.com/open-mmlab/Amphion/assets/58773169/2fc73678-0e60-4665-89ce-52c96aeeaded

https://github.com/open-mmlab/Amphion/assets/58773169/56dec48f-cc1d-403c-af30-ef80b29343a7

https://github.com/open-mmlab/Amphion/assets/58773169/fa8e6091-70f0-431f-b809-82c4c9bee39d

SeanYouLaw avatar May 07 '24 13:05 SeanYouLaw

The quality of the samples sounds good. @Adorable-Qin Please check the code and document carefully.

RMSnow avatar May 07 '24 14:05 RMSnow

Here are some examples of our results, using the checkpoint of 183 epoch(120k steps) training(while above examples are from the pretrained checkpoint):

https://github.com/open-mmlab/Amphion/assets/58773169/3b85076d-187c-4329-9405-f71eea734b89

https://github.com/open-mmlab/Amphion/assets/58773169/b3a1f57a-7e56-4ccc-a319-f7f22bdf11f5

https://github.com/open-mmlab/Amphion/assets/58773169/bbde9081-e414-41e8-848b-304176d871cd

https://github.com/open-mmlab/Amphion/assets/58773169/6754c030-5703-4b5f-afce-85e49282eb55

https://github.com/open-mmlab/Amphion/assets/58773169/0f7b0bc7-8b03-456e-9f59-4ba1b13be955

https://github.com/open-mmlab/Amphion/assets/58773169/de76a1da-95df-4aab-a727-1ccd24bcaaf0

https://github.com/open-mmlab/Amphion/assets/58773169/d849b433-5478-46ca-ae85-f9737bf6de57

https://github.com/open-mmlab/Amphion/assets/58773169/0e1dfd5b-c09f-4da9-842b-94acb49c2bdc

https://github.com/open-mmlab/Amphion/assets/58773169/6ae3b513-035c-42fb-8f0f-cf5041098850

SeanYouLaw avatar May 08 '24 13:05 SeanYouLaw

Our AutoDL server will expire tomorrow. Here is a demo video recording the training status.

https://github.com/open-mmlab/Amphion/assets/30099658/ca69347c-cd65-4052-8666-749900cb12ab

Nugine avatar May 08 '24 15:05 Nugine