deepspeech-german
deepspeech-german copied to clipboard
Does not seem to work
-The Error I get in python mode
(deepspeech.venv) marc@debian:~/Daten.2017/code/speechRecognition/versuch3$ python3 client.py DeepSpeech/model/output_graph.pb DeepSpeech/model/alphabet.txt DeepSpeech/model/lm.binary DeepSpeech/model/trie ../../methodisch_inkorrekt_Folge_116.wav
Loading model from file DeepSpeech/model/output_graph.pb
2018-05-13 16:33:31.937816: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.023s.
Loading language model from files DeepSpeech/model/lm.binary DeepSpeech/model/trie
Loaded language model in 14.681s.
Running inference.
2018-05-13 16:34:00.252102: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=
-console mode does not work because CUDA libraries are missing...
-NPM variant not tested
may i ask what version of deepspeech and tensorflow you used for the generation of the model?
i can made two similar directories, one english (that works with the same command) and one german (that does not work)
I have used the prebuilt client, that can be downloaded via:
python util/taskcluster.py --target /path/to/destination/folder
I also got some errors using the python (I think it was the python version) client of deepspeech on my personal notebook. On the desktop work PC it was running fine, on the laptop not. The native client (c++) was working on both though.
I have to add that I was using deepspeech 2.0 alpha 3, which is not 100% compatible with the trained models. I had no time to debug it yet, it would require me to recompile the model, which is not that trivial it seems (requires some dependencies).
@braindef Did you check the correct command line parameters? They differ between c++ and python version. Also I recommend to post error logs surrounded by 3 backticks (```). this way its easier to read.
example
@AtosNicoS yes, it should be correct, i have two identical directories one model.en and one model.de and in the DeepSpeech directory, when i use the model.en everything works, when i go back cd .. and then to the model.de and call the same command again i have the error in python. i can not test the native client because in debian stretch and debian buster the cuda libraries are missing and the native client seems to require cuda libraries 9 libcudart.so.9.0 and cuda libraries libcudann.so.7.something, but that would be an issue of the the main project https://github.com/mozilla/DeepSpeech...
may i ask you @ynop and @AtosNicoS what operating system do you use because i tried many hours to get the native client working, i even reinstalled my computer with debian stretch and then with debian buster but no success...
I have used Ubuntu 16.04 in a docker container
I am using ArchLinux and I hope to get a new stable release of deepspeech soon, as 1.1 does not build for me.
on ubuntu i got the native-client running, but the next issue, the recognized text has no spaces and seems to be much to short. i used a 5 minute sample 16kHz
marc@ubuntu:~/speech/DeepSpeech.cpu$ ./deepspeech model/output_graph.pb model/alphabet.txt model/lm.binary model/trie ~/minkorrekt-16k.wav TensorFlow: v1.6.0-16-gc346f2c DeepSpeech: v0.2.0-alpha.5-0-g7cc8382 Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2018-05-14 18:07:19.380423: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA ereidogetetshekteiseenoueliurmdiesersemeinweipriseidzeruverhesertesodevewintenekitpaceladsperzezooberueiedieiieneesanhenburevorgereinenesechsefonzzwenzessendetenzweiusnassedrexausdekuszebeschichtederzeitewissenschaftenmitwerifihdermeinereigesozertrelereworvuniermenschzuvorgewiesenesvorcheniorgenstralungderwisenschaftnügudaswöblegaufliebeeürinenuntonflugaufbesanzelstüteimasregenechte
the python client has still the same problem:
marc@ubuntu:~/speech$ source deepspeech.venv/bin/activate
(deepspeech.venv) marc@ubuntu:~/speech$ deepspeech DeepSpeech.cpu/model/output_graph.pb ../minkorrekt-16k.wav DeepSpeech.cpu/model/alphabet.txt DeepSpeech.cpu/model/lm.binary DeepSpeech.cpu/model/trie
Loading model from file DeepSpeech.cpu/model/output_graph.pb
2018-05-14 18:09:17.241463: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.023s.
Loading language model from files DeepSpeech.cpu/model/lm.binary DeepSpeech.cpu/model/trie
Loaded language model in 14.919s.
Running inference.
2018-05-14 18:09:32.651152: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=
the english speach model seems to work with the python variant and some english text i took from john olivers show:
(deepspeech.venv) marc@ubuntu:~/speech/DeepSpeech.cpu$ deepspeech model.en/output_graph.pb ~/english-16k.wav model.en/alphabet.txt model.en/lm.binary model.en/trie Loading model from file model.en/output_graph.pb 2018-05-14 18:23:14.959732: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Loaded model in 0.214s. Loading language model from files model.en/lm.binary model.en/trie Loaded language model in 13.278s. Running inference. kingdom us now sport the same her fat as a dear leader came john on his look was known as the chinese smugler haircut not too long ago in the region but now ill be known as a haircut every man in north korea must have like a gephhe'sthefam there is no solid evidence that that story is true but it is seductive because it sounds like it could be is it he be so the headline trunk to nato i inventevesquarelsyoudbelieve it sounds like something he would have claimed even though as of this tipping he has not a wharthilitmay not be true that all men had to get the same hagcotisskimjohngoonstatetv did an a seywesscored let us trim our hair in accordance with the socialist ligestyle and its weird when a verifiable truth is almost a strange as a wild room is a healthy richardgaorputtajebleinisourstory is comcletlyfalse but woifhthe truth was the eengastanconsentualnewtalalplaywithachinceller that would still be bear you wouldnt have to ixacoratethat and some time to the truth about life in north korea can be just as striking as the open legen'spristenteyoumayhaveseen claims on line that every teacher in north corea is obligated to play the accordin we could not confirm that although in trying to we did discoveredtat north korea juslovetheacordion to a surprising extent the country is full a here is in a courdiemfactoryahessomeschooltildrenplainlecaldiahiscimjonoonlookig at a accorded he is an harcombatexercisedweetheconrapansacrosspoilets and guesswhat ye its a hapiathartybuttheyalsohave a very popular so caued nothing to envy in the worlds that begins with a liingterskyisblu y heart is merry let the sound of a cortim'sragandthen there was this video of north koreans playing the last song that you would expect Inference took 109.993s for 119.673s audio file.
The english model with the native-cleint seems to work also: marc@ubuntu:~/speech/DeepSpeech.cpu$ ./deepspeech model.en/output_graph.pb model.en/alphabet.txt model.en/lm.binary model.en/trie ~/english-16k.wav TensorFlow: v1.6.0-16-gc346f2c DeepSpeech: v0.2.0-alpha.5-0-g7cc8382 Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2018-05-14 18:27:56.589863: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA kingdom us now sport the same her fat as a dear leader came john on his look was known as the chinese smugler haircut not too long ago in the region but now ill be known as a haircut every man in north korea must have like a gephhe'sthefam there is no solid evidence that that story is true but it is seductive because it sounds like it could be is it he be so the headline trunk to nato i inventevesquarelsyoudbelieve it sounds like something he would have claimed even though as of this tipping he has not a wharthilitmay not be true that all men had to get the same hagcotisskimjohngoonstatetv did an a seywesscored let us trim our hair in accordance with the socialist ligestyle and its weird when a verifiable truth is almost a strange as a wild room is a healthy richardgaorputtajebleinisourstory is comcletlyfalse but woifhthe truth was the eengastanconsentualnewtalalplaywithachinceller that would still be bear you wouldnt have to ixacoratethat and some time to the truth about life in north korea can be just as striking as the open legen'spristenteyoumayhaveseen claims on line that every teacher in north corea is obligated to play the accordin we could not confirm that although in trying to we did discoveredtat north korea juslovetheacordion to a surprising extent the country is full a here is in a courdiemfactoryahessomeschooltildrenplainlecaldiahiscimjonoonlookig at a accorded he is an harcombatexercisedweetheconrapansacrosspoilets and guesswhat ye its a hapiathartybuttheyalsohave a very popular so caued nothing to envy in the worlds that begins with a liingterskyisblu y heart is merry let the sound of a cortim'sragandthen there was this video of north koreans playing the last song that you would expect
so @ynop could you already extract text from a wav or do you get also some strange letters?
As you can see in the README, the WER is pretty bad. That is on data similar to the training data (clean speech). I have tried some segments from news, but it doesn't really work. There are some correct words, but not useful at all. There is a lot of random characters and missing spaces.
well in my opinion it is better to have a solution that does not yet work than to use proprietary things :)
btw. i'm from Aarau also chömmer au Schwiizerdütsch rede :)
i continue in english because there are maybe others that dont unterstand swiss german... Q1. what equipment do you have CPUs / GPUs? Q2. what education do you have, are you studying computer science or linguistics? Q3. do you have contact with the mozilla team? Q4. do you understand all the parts like -the trie (digital Tree with words) -the architecture of the AI model -how tensorflow works Q5. what lecture helped you to do your work? Q6. do you continue with this project?
Salii :)
I am studying computer science and mainly working in the field of AI for audio processing. Fortunately I have access to a small GPU cluster to run such experiments. I have no contact to the mozilla team, this was just a prototype for a project. I have a basic understanding of all the parts, but I mainly work with PyTorch (another DL Framework). I did already a few projects, lectures on speech recognition, deep learning, ... during my studies. For example I did the Machine Learning course from Andrew NG in coursera, which is a good entry point to machine learning. It is not planned to continue this project, unless there is going to be more german speech data available (which is the main problem).
the mozilla project will collect more german trainings data: https://twitter.com/FailDef/status/996073759697227776 (i took the twitter links, so i dont have to find out how to insert pictures here :)
the mozilla people have an irc channel #machinelearning on irc.mozilla.org and are quite fast with answering questions...
Mozilla's Common Voice Project has started for German language data in the beginning of June: https://voice.mozilla.org/de Let's improve the voice snippets together!
@braindef have you found a solution for this issue? I run into the exact same problem with @ynop's prebuilt models. English models from mozilla/Deepspeech work great though.
As far as i understood: he did not have enough trainings examples to build a good german language model. maybe we must wait on the common voice project. i guess they will release a good german model, the question is just when?