Automatic_Speech_Recognition
Automatic_Speech_Recognition copied to clipboard
TEDLUIM?
It'll be great to have TEDLUIM dataset support. And then compare against Mozilla DeepSpeech and DeepSpeech PyTroch.
@braindead OK, I would add support and performance as soon as possible. Thanks for your advice.
Appearantly I wasn't the only one with that question ^^ I just did a poc script that builds the dataset: https://github.com/jupiter126/Create_Speech_Dataset Still being tested, but feel free if it is any help
@jupiter126 thank you very much for collecting the dataset, it would be of great help. Maybe can i integrate it into my repo?
Actually, I don't understand much of tensorflow yet, so I figured the best I could be helpful was putting together datasets for models to train: please feel free to integrate it! Please note that I found a few more things to iron out, as you proved this script reaches it objective of being helpful to others, I will update it. I also plan to include other datasets in this.
Encountered a few bug in the latest test version I have here, I hope I'll have most of it ironed out and will update the git by tonight so you have a version you can integrate ;) I say I hope, as each iteration to test the script takes quite a lot a time, so I spend more time on the testing than on the coding
Just updated the script, should be easier to integrate now ;)
@jupiter126 hi, i just ran your code, but i met some warnings or errors. Like this:
;Ϊ�hȅ��Ns�ĥ��ҝ��B��<Mt=ô��/3600: 语法错误: 无效的算术运算符 (错误符号是 "�TV� ��l����T�f*U�*��
)h�2خ</�������&魾�[F��z�$+\��ϩ�l��k �g�⡱ۖ]]:��&�A��
j���f+���O�?���%���U��",<�N���]'6
~;Ϊ�hȅ��Ns�ĥ��ҝ��B��<Mt=ô��/3600")
./pull.sh: 行 83: �ܶ[���_��J�93�w>��疏&����}�XI�Ӿg���"P��1�yNj-n/3600: 语法错误: 需要操作数 (错误符号是 "�ܶ[���_��J�93�w>��疏&����}�XI�Ӿg���"P��1�yNj-n
/3600")
./pull.sh: 行 83: /3600: 语法错误: 需要操作数 (错误符号是 "/3600")
./pull.sh: 行 83: ;�%3˯�2��&�7�0�n_f�W��fm�0���5�����cK������}�1-�=ϋ;!yƙl
?t���LQ��&�ϊ?�9�i���/3600: 语法错误: 需要操作数 (错误符号是 ";�%3˯�2��&�7�0�n_f�W��fm�0���5�����cK������}�1-�=ϋ;!yƙl
`
@jupiter126 so can you fix this? if done, I would integrate to my repo and add your name as author.
Running a complete test run of a new version right now. I don't think I can get rid of the error itself easily, this error happens on less than 2% of the lines, and I'm not sure what triggers it... I tend to believe some of the lines are not formatted in the exact same way as the others. Maybe using an alternate parsing method? What I've tried in the new version however, is to get rid of the error messages, by sending them to /dev/null (not the error itself) Will let you know results as soon as test run is finished
Latest version does not show the errors anymore, but still logs the source of the error, so I hope I will be able to backtrack the source of the issue soon! Posted latest version and running complete test this night, will tell you bout results when run is finished
updated, new version should work!