autovc icon indicating copy to clipboard operation
autovc copied to clipboard

What is the format of the metadata?

Open 1015720437 opened this issue 5 years ago • 21 comments

What is the format of the metadata? I want to try another audio. I checked the data inside. But I don't know what the second one is. The first one is the name,The third is mel-spectrogram.

And does this apply to Chinese audio? Or I need to retrain the model and use Chinese data. thanks!

1015720437 avatar Aug 14 '19 07:08 1015720437

For Chinese audio, you need to retrain the model and retune the hyper params.

auspicious3000 avatar Aug 15 '19 01:08 auspicious3000

What is the difference between train and test metadata? I create metadata from persian waves, but its format is not like yours. I can train the network, but I can't test it. The third section of my metadata is path of .npy files, that created by make_spect.py please help me, sorry I'm confused Thanks a lot.

mhosein4 avatar Jun 04 '20 20:06 mhosein4

The metadata is all different depending on the use case. It is nothing but some sort of nested list. You can easily make your own by looking into one of these metadata.

auspicious3000 avatar Jun 04 '20 21:06 auspicious3000

Thank you for your explanation. I can't understand what is the third section and how to generate it? What is array that's highlight in picture?

image

Thanks for support

mhosein4 avatar Jun 05 '20 09:06 mhosein4

Can you print the shape of it?

auspicious3000 avatar Jun 05 '20 10:06 auspicious3000

Yes I can. but, do you mean I send that for you?

shape.txt

mhosein4 avatar Jun 05 '20 10:06 mhosein4

Just let me know the shape.

auspicious3000 avatar Jun 07 '20 06:06 auspicious3000

Shape of your metadata is (4, 3) but for me is (2,).

mhosein4 avatar Jun 07 '20 07:06 mhosein4

I mean the shape of the 3rd section

auspicious3000 avatar Jun 07 '20 07:06 auspicious3000

I'm sorry about my fault The third section is String, include the paths of spectograms. Like this ---> 's1\p1_1.npy', 's1\p1_2.npy', 's1\p1_3.npy'

mhosein4 avatar Jun 07 '20 07:06 mhosein4

"I can't understand what is the third section and how to generate it? What is array that's highlight in picture?"

This was your original question. What is the shape of the 3rd section you were refering to?

auspicious3000 avatar Jun 07 '20 07:06 auspicious3000

The picture I sent was related to your metadata. The shape of 3rd section of your metadata is (3,). I want to generate the my metadata like the one in the picture. Sorry if I didn't explain well

mhosein4 avatar Jun 07 '20 07:06 mhosein4

There are definitely more than 3 elements in your highlighted area

auspicious3000 avatar Jun 07 '20 07:06 auspicious3000

I'm so sorry again my fault (90, 80) (89, 80) (75, 80) (109, 80) Metadata include 4 speakers.

mhosein4 avatar Jun 07 '20 07:06 mhosein4

These are the spectrograms

auspicious3000 avatar Jun 07 '20 08:06 auspicious3000

So what is the previous array in the second section? Can you send me the Python file? I'm so confused Thanks again for your good support

mhosein4 avatar Jun 07 '20 08:06 mhosein4

Again, the shape please.

Also, where did you get that metadata?

auspicious3000 avatar Jun 07 '20 08:06 auspicious3000

The shapes are (256,) (256,) (256,) (256,) I sent you an email, and you sent your project.

mhosein4 avatar Jun 07 '20 08:06 mhosein4

Those are the speaker embeddings.

In that case, you already had the code to generate this. If not, you can write your own very easily. I don't keep the code, because it is too simple.

auspicious3000 avatar Jun 07 '20 08:06 auspicious3000

@mhosein4 did you understand the metadata format? because I'm trying to run this code now, and I can see that the "metadata.pkl" file in the git does NOT the same as the metadata file would be generated by the "make_metadata.py".

in "metadata.pkl" for every singer there are:

  1. str for the id of the singer
  2. embedding
  3. mel-spec for the songs in the dataset

but when generating a metadata file with "metadata.py" it generates:

  1. str for the id of the singer
  2. embedding
  3. name with type !string! of the songs in the dataset.

so I can't use it... I saw in another issue that someone said the metadata for training and test is different, but I can't understand how and where in the code.

thanks!

amiteliav avatar Jul 29 '21 11:07 amiteliav

@amiteliav in case it's still relevant, you can find an end-to-end implementation in this repo/notebook: https://github.com/KnurpsBram/AutoVC_WavenetVocoder_GriffinLim_experiments/blob/master/AutoVC_WavenetVocoder_GriffinLim_experiments_17jun2020.ipynb

lisabecker avatar May 04 '22 07:05 lisabecker