howl icon indicating copy to clipboard operation
howl copied to clipboard

Howl for different languages

Open codeghees opened this issue 4 years ago • 7 comments

I am currently building a pipeline for a research project which requires KWS - I am confused which one would be better off.

In our use-case, we want to identify key-words over streams of audio data and not in wake word setting. Can I use Howl for that purpose? The model will be served via an API and since it is supervised learning - we want to readily be able to add newer words overtime as well.

codeghees avatar Sep 09 '20 13:09 codeghees

I think howl should be sufficient for that.

honk was mainly aiming for the keywords classification while howl supports keyword spotting over streams of audio with extra inference (filtering) mechanism

ljj7975 avatar Sep 12 '20 16:09 ljj7975

Thank you. What steps would I need to change in case of Urdu keywords (our local language)

codeghees avatar Sep 21 '20 09:09 codeghees

hrm that's an interesting direction.

adding a new dataset to the system can be achieved with a similar change in https://github.com/castorini/howl/pull/31/files

However, I don't think different language is something supported by howl. The main limitation is coming from missing frame level transcription.

@daemon do you know how one can support other language?

ljj7975 avatar Sep 21 '20 22:09 ljj7975

@Ijj7975 we can generate our own pronunciation dictionary using a method we developed in our lab. Would that help?

codeghees avatar Sep 29 '20 09:09 codeghees

I am not that familiar with how MFA aligner actually works in such cases. This woould be something that you will need to dig into.

as long as you can generate data of the right format and corresponding frame level. I don't see why not

ljj7975 avatar Oct 03 '20 16:10 ljj7975

Hi, I think I was able to figure out MFA for Urdu. How do I go about supporting it? @ljj7975 Any help is appreciated.

https://github.com/castorini/howl#preparing-a-dataset

It supports only one word - how do I support multiple?

codeghees avatar Nov 03 '20 11:11 codeghees

As instructed in the read me, you will first need to preprocess your raw datasets using create_raw_dataset. you should generate one for positive audios and one for negative audios. Depends on how your raw dataset is structured, you might need to modify some files (just like the change in https://github.com/castorini/howl/pull/31)

Then using mfa with the Urdu dict, you can align the dataset to get the right datasets for howl.

The instruction just show one keyword but it works for many keywords. just specify VOCAB='["fire"]' INFERENCE_SEQUENCE=[0] accordingly

ljj7975 avatar Jan 09 '21 20:01 ljj7975