python-string-similarity icon indicating copy to clipboard operation
python-string-similarity copied to clipboard

Creating own dataset

Open Wild1234 opened this issue 6 years ago • 4 comments

Good night.

I want to create my own dataset with my own labels. is it possible for this repository?

Thanks

Wild1234 avatar Apr 03 '18 22:04 Wild1234

Hi, Yes, you can use your own dataset.

You need to extract features from raw data. To do this take a look to vggish lib here.

Also you can find how i did this on the fly.

Hope it will help.

igor-panteleev avatar Apr 04 '18 09:04 igor-panteleev

Thanks!

But isn't necessary to train a model with youtube 8M?

Lelo123 avatar Apr 06 '18 19:04 Lelo123

Short answer - yes. Two models have been used here. vggish - as feature extractor. youtube8m - as classifier.

So if you want to use different labels you need to extract features using vggish and then train youtube 8m model with these features.

igor-panteleev avatar Apr 10 '18 11:04 igor-panteleev

Hi,

I still have some questions about creating own data set.

The training script provided by youtube8m is using .tfrecord extension. Do you know how to generate this format for audio?

Also, how do I add my custom labels using the youtube8m training model? Many thanks.

hei9gag avatar Nov 29 '18 10:11 hei9gag