caffe-speech-recognition icon indicating copy to clipboard operation
caffe-speech-recognition copied to clipboard

Does not work "spoken numbers" example

Open aspats opened this issue 8 years ago • 4 comments

Hi @pannous ,

I happy to find example like yours with audio classification. But I see that you need to update your code because it has some problems.

For now I am trying to use "training spoken numbers" example and I found doubts/problems:

  1. In file "numbers_solver.prototxt" you are using net: "numbers_net.autoencoder.prototxt". In "numbers_net.autoencoder.prototxt" are defined training and testing lists files ("train_index_256x256.txt", "test_index_256x256.txt"), but those files does not exist. But I fixed in "numbers_solver.prototxt" file net: "numbers_net.prototxt" . After that step I could start to created caffe model.

  2. When I tried to run backend server with "recognition-server.py", I got it: ... net = caffe.Net(model, weights) Traceback (most recent call last): File "", line 2, in Boost.Python.ArgumentError: Python argument types in Net.init(Net, str, str) did not match C++ signature: init(boost::python::api::object, std::string, std::string, int) init(boost::python::api::object, std::string, int)

  3. And it is not clear in some code you are using original size of images 512x512 and in another code you are reducing size 256x256. Because now I used original images to create model, but in code part "recognition-server.py" and "rocord.py" you are transforming image.

  4. And would like to get original audio files of "spoken numbers" and I want to know how did you made from wav to png?

I will be happy to get answer from you. I really like your audio classification example, just I think you need to update it.

Thanks!

aspats avatar Sep 02 '15 14:09 aspats

Hi,

I do have the same/similar issue. Yesterday I

  • freshly cloned caffe and caffe-speech-recognition from git,
  • built caffe,
  • downloaded http://pannous.net/spoken_numbers.tar and extracted into the caffe-speech-recognition root directory
  • started ./train.sh and stumbled across issue 1) of my previous poster.

After implementing above fixes I now get the Issue from this thread: https://github.com/pannous/caffe-speech-recognition/issues/1 :

[...] I0816 14:41:04.538826 3856 layer_factory.hpp:77] Creating layer alpha I0816 14:41:04.538861 3856 net.cpp:100] Creating Layer alpha I0816 14:41:04.538871 3856 net.cpp:408] alpha -> data I0816 14:41:04.538889 3856 net.cpp:408] alpha -> label I0816 14:41:04.538908 3856 image_data_layer.cpp:38] Opening file train_index.txt I0816 14:41:04.539526 3856 image_data_layer.cpp:58] A total of 2049 images. E0816 14:41:04.539546 3856 io.cpp:80] Could not open or find file spoken_numbers/3_Princess_220.wav.png 3 F0816 14:41:04.539655 3856 image_data_layer.cpp:72] Check failed: cv_img.data Could not load spoken_numbers/3_Princess_220.wav.png 3 [...]

Looks to me as if the data/label info line is not split properly.The file is definitely there. Is this an issue with the version of caffe being too recent / handling the index file differently? If this is the case: Which version of caffe would be known to work with your setup?

Cheers, Sebastian

sebastian-lapuschkin avatar Aug 16 '16 12:08 sebastian-lapuschkin

Hi, this demo code is two years old, updating the code or data to the current caffe version / requirements shouldn't be too hard though.

pannous avatar Aug 16 '16 13:08 pannous

Hi pannous,

first let me thank you for your swift reply yesterday.

I went (for now) the lazy way by running caffe-rc2 from https://github.com/BVLC/caffe/archive/rc2.zip and modifying numbers_solver.prototxt such that numbers_net.prototxt is used (just switch comment/uncomment in lines 2 and 3). The latter is missing training data and index files.

This seems to work (it is training).

sebastian-lapuschkin avatar Aug 17 '16 13:08 sebastian-lapuschkin

I also found another way around the "3_Princess_220.wav.png file not found" error. I did what Sebastian did and edited numbers_solver.prototxt by uncommenting/commenting lines 2 and 3 so that numbers_net.prototxt is used.

I also edited train_index.txt and test_index.txt and removed all the tabs and replaced them with a whitespace. So the first line of train_index.txt will be "/spoken_numbers/3_Princess_220.wav.png 3" and the line after that will be "/spoken_numbers/6_Allison_60.wav.png 6" etc...

After that everything seems to be working.

tomevang avatar Aug 02 '17 15:08 tomevang