A-Hackers-AI-Voice-Assistant Can I also use mp3 files for training instead of wav files?

Hello first of all nice work! I wanted to ask if I could use mp3 files instead of wav files and which lines I have to change for that, if this is working?

Apr 09 '21 16:04 Jochen-sys

Hello @Jochen-sys I think that's not possible.

Apr 09 '21 17:04 CracKCatZ

@Jochen-sys because the spectograms are made out of the wav files

Apr 09 '21 17:04 CracKCatZ

@CracKCatZ Thanks for the quick answer. But I also can create a spectogram with a mp3 file. Admitting I only tested it with tensorflow and I know it works there. If it didn't work with pytorch, would it be possible to do the one task with tensorflow and the rest with pytorch?

Apr 09 '21 18:04 Jochen-sys

@Jochen-sys hmmm good question I actually don't know if this is possible but I think yes you can do one part with tensorflow and the other with pytorch you just need to fed the spectograms some how into pytorch.

Apr 09 '21 20:04 CracKCatZ

@Jochen-sys but why would you want to use mp3 files instead of wav files it's much easier to handel and format them:)

Apr 09 '21 20:04 CracKCatZ

@CracKCatZ First of all I don't have so much storage on my computer. But the bigger problem is my GPU, so I have to train with google colab. Wav Files are too big, mp3 files are not so big. Another idea was only to upload the spectogram to google colab, so the files wouldn't be too big. Is there a quality different between wav and mp3? I know that wav files have a better quality, but I don't really think this is so important in training. Or it's better, because the model has to transcript worse quality.

Apr 10 '21 08:04 Jochen-sys

@Jochen-sys mp3 files are like any normal files, wav files(wave files ) are constructed different they look quite different too because every sound is displayed as a wave, mp3's on the other hand not. I don't know how it would change the performance of the model or the training. U can add me on discord: SheeeshForce#8083

Apr 10 '21 09:04 CracKCatZ

@CracKCatZ Thanks for explaning. I wanted to test the engine.py, but I got an error "ImportError: cannot import name 'imsave'". imsave is from scipy.misc and I found out that stackoverflow means it should be imageio. Now I'm confused, because I think it should work with imsave?! Could you help me out there please?

Apr 10 '21 15:04 Jochen-sys

@Jochen-sys I am actually not familiar with imsave and imageio

Apr 10 '21 15:04 CracKCatZ

@CracKCatZ Ok but can you run engine.py without problems?

Apr 10 '21 16:04 Jochen-sys

@Jochen-sys at the moment not because for installing the ctcdecoder I have to switch to Linux.

Apr 10 '21 16:04 CracKCatZ

@CracKCatZ Ok I'm sorry I'm an idiot, I fixed it. My problem was that I thought neuralnet would be a regular pypi package and not a special self programmed one. Why did you name scripts or folders like other existing packages on pypi :-) (there is sadly no smiley which is laughing)?

Apr 10 '21 16:04 Jochen-sys

@CracKCatZ Where exactly will the spectrograms be produced? There is so much code with spectrograms, I don't find the exact one.

Apr 11 '21 08:04 Jochen-sys

@CracKCatZ Which version of ctcdecode do you use? (Mine worked a few days ago, but than it failed) What does the ken_lm file mean? Is this the file which did such a good transcription in the video?

Apr 15 '21 07:04 Jochen-sys

@Jochen-sys I don't was able to test ctcdecode yet

Apr 15 '21 12:04 CracKCatZ

@CracKCatZ Ok got it. Do I have to use the ckpt file for training from a checkpoint (argument for --load_model_from)? And how can I get zip file in the end of training or a ckpt file? I think I need a zip file for transcription with the microphone, but I also would like to get a ckpt file for further training in the future.

Apr 17 '21 12:04 Jochen-sys

Hey @Jochen-sys yes you have:) The model will be saved automatically as a ckpt file:) Yes I think that you need one too(btw I need also one ) because I think without the zip we get no outputs. Could you please add me on discord please so we could talk there and speed up communication? Name:SheeeshForce1#8083

Apr 18 '21 13:04 NoCodeAvaible

Sorry I don't have discord. Ok thanks. I'm getting the folowing error when I use the argument --load_model_from speechrecognition.ckpt: RuntimeError: Error(s) in loading state_dict for SpeechModule: Unexpected key(s) in state_dict: "model.cnn.0.weight", "model.cnn.0.bias", "model.cnn.1.norm.weight", "model.cnn.1.norm.bias", "model.dense.0.weight", "model.dense.0.bias", "model.dense.1.weight", "model.dense.1.bias", "model.dense.4.weight", "model.dense.4.bias", "model.dense.5.weight", "model.dense.5.bias", "model.lstm.weight_ih_l0", "model.lstm.weight_hh_l0", "model.lstm.bias_ih_l0", "model.lstm.bias_hh_l0", "model.layer_norm2.weight", "model.layer_norm2.bias", "model.final_fc.weight", "model.final_fc.bias".

Does anyone now what this means?

Then I tried to use the argument --resume_from_checkpoint (I don't know what this argument is doing, sorry) instead of --load_model_from. But this doesn't work, too. Following error: checkpoint_callbacks[-1].best_model_path = checkpoint['checkpoint_callback_best_model_path'] KeyError: 'checkpoint_callback_best_model_path'

Apr 20 '21 11:04 Jochen-sys

Ok I fixed the first error. My version of pytorch_lightning was to old. But what does the --resume_from_checkpoint argument mean?

Apr 21 '21 15:04 Jochen-sys

@Jochen-sys It means that you insert an checkpoint file as default or truh the terminal(set required false if you set it as default) and the training is being resumed from this checkpoint. U basically use it to resume training if you stopped the training, if you want to test the checkpoint(model that you create in optimize_graph.py) or if your pc shuts down for an unknown reason while training.

Apr 21 '21 16:04 CracKCatZ

@CracKCatZ Do you know why loss could be "nan"? At the beginning it worked with a real float, but now I only see this string there. I researched this, but didn't find a good cause.

Apr 24 '21 17:04 Jochen-sys

@Jochen-sys yes Cuda and cudnn are not installed the right way. U can search on YouTube for videos for a correct cuda and cudnn installation:)

Apr 24 '21 17:04 CracKCatZ

@CracKCatZ Ouh ok that's interesting thanks. I'm using my CPU.

Apr 24 '21 17:04 Jochen-sys

@Jochen-sys are you working with mp3 files now?

Apr 24 '21 19:04 CracKCatZ

With Windows, no, because it doesn't work there with mp3 files, but it is working with Linux. I'm training with my Windows system, I only have Linux as vm.

Apr 25 '21 08:04 Jochen-sys

@CracKCatZ Do you know what this ken_lm is for and where I could get it? Is this the file which improved the transcription in the video so much? When not, what was the file which improved the transcription so much?

May 01 '21 08:05 Jochen-sys

@Jochen-sys did you already tested the speechrecognition?

May 01 '21 09:05 CracKCatZ

@CracKCatZ Yes with the zip model. But it's not so good. But I remember that in the video he used something else, too, to get good results.

May 01 '21 09:05 Jochen-sys

@Jochen-sys hold up did u used the portaudio library because I think that this library is required and can give better results. Could you please tell me if you have portaudio already installed at the beginning of working with this project or if you have to install it?

May 01 '21 20:05 CracKCatZ

@CracKCatZ Sorry for late response. Yes, I think so. To come back to the loss=nan problem: Why isn't loss=nan when I train the same wav file for a few epochs? Could I try to train only one wav file per training or would the result be worse?

May 08 '21 11:05 Jochen-sys

A-Hackers-AI-Voice-Assistant A-Hackers-AI-Voice-Assistant copied to clipboard

Can I also use mp3 files for training instead of wav files?

A-Hackers-AI-Voice-Assistant
A-Hackers-AI-Voice-Assistant copied to clipboard