A-Hackers-AI-Voice-Assistant icon indicating copy to clipboard operation
A-Hackers-AI-Voice-Assistant copied to clipboard

Can I also use mp3 files for training instead of wav files?

Open Jochen-sys opened this issue 3 years ago • 32 comments

Hello first of all nice work! I wanted to ask if I could use mp3 files instead of wav files and which lines I have to change for that, if this is working?

Jochen-sys avatar Apr 09 '21 16:04 Jochen-sys

Hello @Jochen-sys I think that's not possible.

CracKCatZ avatar Apr 09 '21 17:04 CracKCatZ

@Jochen-sys because the spectograms are made out of the wav files

CracKCatZ avatar Apr 09 '21 17:04 CracKCatZ

@CracKCatZ Thanks for the quick answer. But I also can create a spectogram with a mp3 file. Admitting I only tested it with tensorflow and I know it works there. If it didn't work with pytorch, would it be possible to do the one task with tensorflow and the rest with pytorch?

Jochen-sys avatar Apr 09 '21 18:04 Jochen-sys

@Jochen-sys hmmm good question I actually don't know if this is possible but I think yes you can do one part with tensorflow and the other with pytorch you just need to fed the spectograms some how into pytorch.

CracKCatZ avatar Apr 09 '21 20:04 CracKCatZ

@Jochen-sys but why would you want to use mp3 files instead of wav files it's much easier to handel and format them:)

CracKCatZ avatar Apr 09 '21 20:04 CracKCatZ

@CracKCatZ First of all I don't have so much storage on my computer. But the bigger problem is my GPU, so I have to train with google colab. Wav Files are too big, mp3 files are not so big. Another idea was only to upload the spectogram to google colab, so the files wouldn't be too big. Is there a quality different between wav and mp3? I know that wav files have a better quality, but I don't really think this is so important in training. Or it's better, because the model has to transcript worse quality.

Jochen-sys avatar Apr 10 '21 08:04 Jochen-sys

@Jochen-sys mp3 files are like any normal files, wav files(wave files ) are constructed different they look quite different too because every sound is displayed as a wave, mp3's on the other hand not. I don't know how it would change the performance of the model or the training. U can add me on discord: SheeeshForce#8083

CracKCatZ avatar Apr 10 '21 09:04 CracKCatZ

@CracKCatZ Thanks for explaning. I wanted to test the engine.py, but I got an error "ImportError: cannot import name 'imsave'". imsave is from scipy.misc and I found out that stackoverflow means it should be imageio. Now I'm confused, because I think it should work with imsave?! Could you help me out there please?

Jochen-sys avatar Apr 10 '21 15:04 Jochen-sys

@Jochen-sys I am actually not familiar with imsave and imageio

CracKCatZ avatar Apr 10 '21 15:04 CracKCatZ

@CracKCatZ Ok but can you run engine.py without problems?

Jochen-sys avatar Apr 10 '21 16:04 Jochen-sys

@Jochen-sys at the moment not because for installing the ctcdecoder I have to switch to Linux.

CracKCatZ avatar Apr 10 '21 16:04 CracKCatZ

@CracKCatZ Ok I'm sorry I'm an idiot, I fixed it. My problem was that I thought neuralnet would be a regular pypi package and not a special self programmed one. Why did you name scripts or folders like other existing packages on pypi :-) (there is sadly no smiley which is laughing)?

Jochen-sys avatar Apr 10 '21 16:04 Jochen-sys

@CracKCatZ Where exactly will the spectrograms be produced? There is so much code with spectrograms, I don't find the exact one.

Jochen-sys avatar Apr 11 '21 08:04 Jochen-sys

@CracKCatZ Which version of ctcdecode do you use? (Mine worked a few days ago, but than it failed) What does the ken_lm file mean? Is this the file which did such a good transcription in the video?

Jochen-sys avatar Apr 15 '21 07:04 Jochen-sys

@Jochen-sys I don't was able to test ctcdecode yet

CracKCatZ avatar Apr 15 '21 12:04 CracKCatZ

@CracKCatZ Ok got it. Do I have to use the ckpt file for training from a checkpoint (argument for --load_model_from)? And how can I get zip file in the end of training or a ckpt file? I think I need a zip file for transcription with the microphone, but I also would like to get a ckpt file for further training in the future.

Jochen-sys avatar Apr 17 '21 12:04 Jochen-sys

Hey @Jochen-sys yes you have:) The model will be saved automatically as a ckpt file:) Yes I think that you need one too(btw I need also one ) because I think without the zip we get no outputs. Could you please add me on discord please so we could talk there and speed up communication? Name:SheeeshForce1#8083

NoCodeAvaible avatar Apr 18 '21 13:04 NoCodeAvaible

Sorry I don't have discord. Ok thanks. I'm getting the folowing error when I use the argument --load_model_from speechrecognition.ckpt: RuntimeError: Error(s) in loading state_dict for SpeechModule: Unexpected key(s) in state_dict: "model.cnn.0.weight", "model.cnn.0.bias", "model.cnn.1.norm.weight", "model.cnn.1.norm.bias", "model.dense.0.weight", "model.dense.0.bias", "model.dense.1.weight", "model.dense.1.bias", "model.dense.4.weight", "model.dense.4.bias", "model.dense.5.weight", "model.dense.5.bias", "model.lstm.weight_ih_l0", "model.lstm.weight_hh_l0", "model.lstm.bias_ih_l0", "model.lstm.bias_hh_l0", "model.layer_norm2.weight", "model.layer_norm2.bias", "model.final_fc.weight", "model.final_fc.bias".

Does anyone now what this means?

Then I tried to use the argument --resume_from_checkpoint (I don't know what this argument is doing, sorry) instead of --load_model_from. But this doesn't work, too. Following error: checkpoint_callbacks[-1].best_model_path = checkpoint['checkpoint_callback_best_model_path'] KeyError: 'checkpoint_callback_best_model_path'

Jochen-sys avatar Apr 20 '21 11:04 Jochen-sys

Ok I fixed the first error. My version of pytorch_lightning was to old. But what does the --resume_from_checkpoint argument mean?

Jochen-sys avatar Apr 21 '21 15:04 Jochen-sys

@Jochen-sys It means that you insert an checkpoint file as default or truh the terminal(set required false if you set it as default) and the training is being resumed from this checkpoint. U basically use it to resume training if you stopped the training, if you want to test the checkpoint(model that you create in optimize_graph.py) or if your pc shuts down for an unknown reason while training.

CracKCatZ avatar Apr 21 '21 16:04 CracKCatZ

@CracKCatZ Do you know why loss could be "nan"? At the beginning it worked with a real float, but now I only see this string there. I researched this, but didn't find a good cause.

Jochen-sys avatar Apr 24 '21 17:04 Jochen-sys

@Jochen-sys yes Cuda and cudnn are not installed the right way. U can search on YouTube for videos for a correct cuda and cudnn installation:)

CracKCatZ avatar Apr 24 '21 17:04 CracKCatZ

@CracKCatZ Ouh ok that's interesting thanks. I'm using my CPU.

Jochen-sys avatar Apr 24 '21 17:04 Jochen-sys

@Jochen-sys are you working with mp3 files now?

CracKCatZ avatar Apr 24 '21 19:04 CracKCatZ

With Windows, no, because it doesn't work there with mp3 files, but it is working with Linux. I'm training with my Windows system, I only have Linux as vm.

Jochen-sys avatar Apr 25 '21 08:04 Jochen-sys

@CracKCatZ Do you know what this ken_lm is for and where I could get it? Is this the file which improved the transcription in the video so much? When not, what was the file which improved the transcription so much?

Jochen-sys avatar May 01 '21 08:05 Jochen-sys

@Jochen-sys did you already tested the speechrecognition?

CracKCatZ avatar May 01 '21 09:05 CracKCatZ

@CracKCatZ Yes with the zip model. But it's not so good. But I remember that in the video he used something else, too, to get good results.

Jochen-sys avatar May 01 '21 09:05 Jochen-sys

@Jochen-sys hold up did u used the portaudio library because I think that this library is required and can give better results. Could you please tell me if you have portaudio already installed at the beginning of working with this project or if you have to install it?

CracKCatZ avatar May 01 '21 20:05 CracKCatZ

@CracKCatZ Sorry for late response. Yes, I think so. To come back to the loss=nan problem: Why isn't loss=nan when I train the same wav file for a few epochs? Could I try to train only one wav file per training or would the result be worse?

Jochen-sys avatar May 08 '21 11:05 Jochen-sys