kaldi-active-grammar icon indicating copy to clipboard operation
kaldi-active-grammar copied to clipboard

Personalized/Incremental Speech Model Training

Open kendonB opened this issue 5 years ago • 21 comments

I'm not sure if Kaldi is capable of this, but DPI 15 seems to allow for "incremental" training, where the program starts with a base model, then very quickly learns from the user's training speech. On mine it seems to get better in a few seconds after just saving the user profile. Is Kaldi capable of this type of learning?

kendonB avatar Jan 20 '20 04:01 kendonB

Although this isn't officially advised/supported in Kaldi, I recently got it working. I have more testing to do to determine what the sweet spot is for the amount of training data, but preliminary results are very positive. I plan on posting some numbers on this soon.

However, performing the training on the client machine is extremely difficult, due to Kaldi's many dependencies for training. So for the near term, I think collecting the data locally and then performing the training in the cloud may be most practical.

daanzu avatar Jan 20 '20 04:01 daanzu

Some numbers are posted in https://github.com/daanzu/kaldi-active-grammar/blob/master/docs/models.md

daanzu avatar Apr 20 '20 14:04 daanzu

@daanzu anything i can do to help beta test this?

kendonB avatar Apr 26 '20 21:04 kendonB

@kendonB Eventually I would like to streamline it and package it up in Docker or something, but there's more work to be done for that.

However, if you're comfortable sending me some audio and transcripts for training, I can use you for testing. I will be posting more info on recommended ways for collecting this training data soon.

daanzu avatar Apr 27 '20 08:04 daanzu

I would really love to see this docker container! Something semi ready would be welcome, too

JohnDoe02 avatar May 24 '20 17:05 JohnDoe02

Hi @daanzu, just revisiting this. I know there was some progress discussed here: https://github.com/daanzu/kaldi-active-grammar/issues/33

Do you have anything to share that would be noob-friendly?

kendonB avatar Jul 13 '21 01:07 kendonB

@kendonB Thanks for reminding me! I have been slow in making this completely noob-friendly, and therefore afraid of releasing it, but I do have something you could try out. Do you have some audio data with transcripts, and a docker installation? Having CUDA is nice, but not a strict requirement if you don't mind having patience with the CPU.

daanzu avatar Jul 13 '21 06:07 daanzu

I can make some training data - do you have some suggested transcripts to read? Docker + CUDA are both easy to get going. Would it be easiest to set up a private repo that we can iterate on?

kendonB avatar Jul 27 '21 22:07 kendonB

@kendonB Great! I would say there are two good ways to gather training data: retaining data from your current speech recognition usage, and recording training data directly. Recording directly is certainly most efficient in gathering the best training data, though retaining is a relatively easy way to gather a lot, and there are a few ways to try to weed out any bad data.

Here's an app I threw together for recording data directly, and storing it in an easy format, and it comes with a few standard sets of training sentences that try to equally cover the range of english sounds. https://github.com/daanzu/speech-training-recorder

I will put up a repo with the training setup.

daanzu avatar Jul 28 '21 01:07 daanzu

Hi, @daanzu. I'm also super interested in this and happy to help in any way that I can. I'm pretty new to dictation in general, but strong with python and docker. I'm not sure what you're using to train, but I have some experience with tensorflow and keras, but not too advanced. I've recorded a bunch of data with your recorder and have NVIDIA GPUs with CUDA ready to throw at it :)

bluecamel avatar Jul 31 '21 05:07 bluecamel

@bluecamel Great! I should have scripts to play with uploaded within another day or two. FYI, for all standard Kaldi model training, it uses its own nnet code (rather than Tensorflow/Pytorch/etc), but of course it has full CUDA support. I put together separate docker images for CPU and CUDA: https://hub.docker.com/u/daanzu

daanzu avatar Aug 01 '21 15:08 daanzu

@daanzu I have made some recordings and have the docker image downloaded. I presume I just need to place the audio data in audio_data into the docker image then hit go. How do I do that? I have CUDA

kendonB avatar Aug 04 '21 01:08 kendonB

@kendonB Ah, yes, there are still some top-level scripts I need to add to the image, and instructions. Hopefully tonight!

daanzu avatar Aug 04 '21 03:08 daanzu

@daanzu I can't see those added here: https://hub.docker.com/u/daanzu are they somewhere else? Apologies for the noob questions

kendonB avatar Aug 09 '21 04:08 kendonB

@kendonB Sorry, I've been busy, and haven't had a chance to finish adding the necessary scripts yet. Really hope to soon, and will post an update when it's ready.

daanzu avatar Aug 09 '21 14:08 daanzu

Terribly late and entirely untested (as of yet): https://github.com/daanzu/kaldi_ag_training

daanzu avatar Aug 13 '21 08:08 daanzu

@daanzu Yay! Can't wait to try it, but the mentioned tag (2021-08-04) isn't on docker hub.

bluecamel avatar Aug 14 '21 20:08 bluecamel

Ah, sorry, I see that I can just build the image. I'll try that. If you don't mind, I'll make a PR with some adjustments to docs as i go along to help other amateurs like me. :)

bluecamel avatar Aug 14 '21 22:08 bluecamel

@bluecamel Oops, actually I think that is leftover and 2020-11-28 would be fine. I pushed an update. Thanks for any help!

daanzu avatar Aug 15 '21 01:08 daanzu

FYI, I decided to try to treat the docker image as evergreen, and keep the things liable to change a lot like scripts in the git repo instead.

daanzu avatar Aug 15 '21 01:08 daanzu

Hi @daanzu, did you have any progress on this one? I don't think I mentioned it here, but I never got it to work, even with many recordings. Do you know of anyone that managed to get it to work?

kendonB avatar Dec 07 '22 20:12 kendonB