autovc
autovc copied to clipboard
What is the format of the metadata?
What is the format of the metadata? I want to try another audio. I checked the data inside. But I don't know what the second one is. The first one is the name,The third is mel-spectrogram.
And does this apply to Chinese audio? Or I need to retrain the model and use Chinese data. thanks!
For Chinese audio, you need to retrain the model and retune the hyper params.
What is the difference between train and test metadata? I create metadata from persian waves, but its format is not like yours. I can train the network, but I can't test it. The third section of my metadata is path of .npy files, that created by make_spect.py please help me, sorry I'm confused Thanks a lot.
The metadata is all different depending on the use case. It is nothing but some sort of nested list. You can easily make your own by looking into one of these metadata.
Thank you for your explanation. I can't understand what is the third section and how to generate it? What is array that's highlight in picture?
Thanks for support
Can you print the shape of it?
Just let me know the shape.
Shape of your metadata is (4, 3) but for me is (2,).
I mean the shape of the 3rd section
I'm sorry about my fault The third section is String, include the paths of spectograms. Like this ---> 's1\p1_1.npy', 's1\p1_2.npy', 's1\p1_3.npy'
"I can't understand what is the third section and how to generate it? What is array that's highlight in picture?"
This was your original question. What is the shape of the 3rd section you were refering to?
The picture I sent was related to your metadata. The shape of 3rd section of your metadata is (3,). I want to generate the my metadata like the one in the picture. Sorry if I didn't explain well
There are definitely more than 3 elements in your highlighted area
I'm so sorry again my fault (90, 80) (89, 80) (75, 80) (109, 80) Metadata include 4 speakers.
These are the spectrograms
So what is the previous array in the second section? Can you send me the Python file? I'm so confused Thanks again for your good support
Again, the shape please.
Also, where did you get that metadata?
The shapes are (256,) (256,) (256,) (256,) I sent you an email, and you sent your project.
Those are the speaker embeddings.
In that case, you already had the code to generate this. If not, you can write your own very easily. I don't keep the code, because it is too simple.
@mhosein4 did you understand the metadata format? because I'm trying to run this code now, and I can see that the "metadata.pkl" file in the git does NOT the same as the metadata file would be generated by the "make_metadata.py".
in "metadata.pkl" for every singer there are:
- str for the id of the singer
- embedding
- mel-spec for the songs in the dataset
but when generating a metadata file with "metadata.py" it generates:
- str for the id of the singer
- embedding
- name with type !string! of the songs in the dataset.
so I can't use it... I saw in another issue that someone said the metadata for training and test is different, but I can't understand how and where in the code.
thanks!
@amiteliav in case it's still relevant, you can find an end-to-end implementation in this repo/notebook: https://github.com/KnurpsBram/AutoVC_WavenetVocoder_GriffinLim_experiments/blob/master/AutoVC_WavenetVocoder_GriffinLim_experiments_17jun2020.ipynb