A-Hackers-AI-Voice-Assistant
A-Hackers-AI-Voice-Assistant copied to clipboard
Can I also use mp3 files for training instead of wav files?
Hello first of all nice work! I wanted to ask if I could use mp3 files instead of wav files and which lines I have to change for that, if this is working?
Hello @Jochen-sys I think that's not possible.
@Jochen-sys because the spectograms are made out of the wav files
@CracKCatZ Thanks for the quick answer. But I also can create a spectogram with a mp3 file. Admitting I only tested it with tensorflow and I know it works there. If it didn't work with pytorch, would it be possible to do the one task with tensorflow and the rest with pytorch?
@Jochen-sys hmmm good question I actually don't know if this is possible but I think yes you can do one part with tensorflow and the other with pytorch you just need to fed the spectograms some how into pytorch.
@Jochen-sys but why would you want to use mp3 files instead of wav files it's much easier to handel and format them:)
@CracKCatZ First of all I don't have so much storage on my computer. But the bigger problem is my GPU, so I have to train with google colab. Wav Files are too big, mp3 files are not so big. Another idea was only to upload the spectogram to google colab, so the files wouldn't be too big. Is there a quality different between wav and mp3? I know that wav files have a better quality, but I don't really think this is so important in training. Or it's better, because the model has to transcript worse quality.
@Jochen-sys mp3 files are like any normal files, wav files(wave files ) are constructed different they look quite different too because every sound is displayed as a wave, mp3's on the other hand not. I don't know how it would change the performance of the model or the training. U can add me on discord: SheeeshForce#8083
@CracKCatZ Thanks for explaning. I wanted to test the engine.py, but I got an error "ImportError: cannot import name 'imsave'". imsave is from scipy.misc and I found out that stackoverflow means it should be imageio. Now I'm confused, because I think it should work with imsave?! Could you help me out there please?
@Jochen-sys I am actually not familiar with imsave and imageio
@CracKCatZ Ok but can you run engine.py without problems?
@Jochen-sys at the moment not because for installing the ctcdecoder I have to switch to Linux.
@CracKCatZ Ok I'm sorry I'm an idiot, I fixed it. My problem was that I thought neuralnet would be a regular pypi package and not a special self programmed one. Why did you name scripts or folders like other existing packages on pypi :-) (there is sadly no smiley which is laughing)?
@CracKCatZ Where exactly will the spectrograms be produced? There is so much code with spectrograms, I don't find the exact one.
@CracKCatZ Which version of ctcdecode do you use? (Mine worked a few days ago, but than it failed) What does the ken_lm file mean? Is this the file which did such a good transcription in the video?
@Jochen-sys I don't was able to test ctcdecode yet
@CracKCatZ Ok got it. Do I have to use the ckpt file for training from a checkpoint (argument for --load_model_from)? And how can I get zip file in the end of training or a ckpt file? I think I need a zip file for transcription with the microphone, but I also would like to get a ckpt file for further training in the future.
Hey @Jochen-sys yes you have:) The model will be saved automatically as a ckpt file:) Yes I think that you need one too(btw I need also one ) because I think without the zip we get no outputs. Could you please add me on discord please so we could talk there and speed up communication? Name:SheeeshForce1#8083
Sorry I don't have discord. Ok thanks. I'm getting the folowing error when I use the argument --load_model_from speechrecognition.ckpt: RuntimeError: Error(s) in loading state_dict for SpeechModule: Unexpected key(s) in state_dict: "model.cnn.0.weight", "model.cnn.0.bias", "model.cnn.1.norm.weight", "model.cnn.1.norm.bias", "model.dense.0.weight", "model.dense.0.bias", "model.dense.1.weight", "model.dense.1.bias", "model.dense.4.weight", "model.dense.4.bias", "model.dense.5.weight", "model.dense.5.bias", "model.lstm.weight_ih_l0", "model.lstm.weight_hh_l0", "model.lstm.bias_ih_l0", "model.lstm.bias_hh_l0", "model.layer_norm2.weight", "model.layer_norm2.bias", "model.final_fc.weight", "model.final_fc.bias".
Does anyone now what this means?
Then I tried to use the argument --resume_from_checkpoint (I don't know what this argument is doing, sorry) instead of --load_model_from. But this doesn't work, too. Following error: checkpoint_callbacks[-1].best_model_path = checkpoint['checkpoint_callback_best_model_path'] KeyError: 'checkpoint_callback_best_model_path'
Ok I fixed the first error. My version of pytorch_lightning was to old. But what does the --resume_from_checkpoint argument mean?
@Jochen-sys It means that you insert an checkpoint file as default or truh the terminal(set required false if you set it as default) and the training is being resumed from this checkpoint. U basically use it to resume training if you stopped the training, if you want to test the checkpoint(model that you create in optimize_graph.py) or if your pc shuts down for an unknown reason while training.
@CracKCatZ Do you know why loss could be "nan"? At the beginning it worked with a real float, but now I only see this string there. I researched this, but didn't find a good cause.
@Jochen-sys yes Cuda and cudnn are not installed the right way. U can search on YouTube for videos for a correct cuda and cudnn installation:)
@CracKCatZ Ouh ok that's interesting thanks. I'm using my CPU.
@Jochen-sys are you working with mp3 files now?
With Windows, no, because it doesn't work there with mp3 files, but it is working with Linux. I'm training with my Windows system, I only have Linux as vm.
@CracKCatZ Do you know what this ken_lm is for and where I could get it? Is this the file which improved the transcription in the video so much? When not, what was the file which improved the transcription so much?
@Jochen-sys did you already tested the speechrecognition?
@CracKCatZ Yes with the zip model. But it's not so good. But I remember that in the video he used something else, too, to get good results.
@Jochen-sys hold up did u used the portaudio library because I think that this library is required and can give better results. Could you please tell me if you have portaudio already installed at the beginning of working with this project or if you have to install it?
@CracKCatZ Sorry for late response. Yes, I think so. To come back to the loss=nan problem: Why isn't loss=nan when I train the same wav file for a few epochs? Could I try to train only one wav file per training or would the result be worse?