vosk-browser icon indicating copy to clipboard operation
vosk-browser copied to clipboard

How to create an example of the X-vector of the speaker (voice fingerprint)?

Open arbdevml opened this issue 1 year ago • 5 comments

Hello. First of all very big thank you for this project.

I am trying to create an example with a speaker model to get the X-vector of the speaker (voice fingerprint).

I am using this example: https://github.com/ccoreilly/vosk-browser/blob/master/examples/words-vanilla/index.js

const model = await Vosk.createModel('vosk-model-small-en-in-0.4.tar.gz');
const speakerModel = await Vosk.createSpeakerModel('vosk-model-spk-0.4.zip');

...

const recognizer = new model.KaldiRecognizer(sampleRate, JSON.stringify(['[unk]', 'encen el llum', 'apaga el llum']));
recognizer.setSpkModel(speakerModel);
recognizer.on("result", (message) => {
	const result = message.result;
	if(result.hasOwnProperty('spk'))
		console.info("X-vector:", result.spk);
});

Speaker identification model: https://alphacephei.com/vosk/models/vosk-model-spk-0.4.zip

Node.js example: https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/test_speaker.js

Could you offer some advice, please:

  1. How to load vosk-model-spk-0.4.zip
  2. How to implement methods createSpeakerModel and setSpkModel
  3. How to fetch the X-vector of the speaker (voice fingerprint)? Thank you for your answer.

arbdevml avatar Aug 15 '22 11:08 arbdevml

Hi Alexandro!

The constructor for SpkModel should already be available, but the setter method of the KaldiRecognizer Recognizer::SetSpkModel still needs to be exposed in src/bindings.cc. After that, the speaker x-vector should be available together with the result.

The constructor Recognizer::Recognizer(Model *model, float sample_frequency, SpkModel *spk_model) should also be exposed for completeness.

Go ahead if you want to give it a try. I will otherwise make some time next week for it.

ccoreilly avatar Aug 19 '22 14:08 ccoreilly

Preparing Builder Environment: apt update && apt -y upgrade apt install -y build-essential git sudo screen curl

curl -sSL https://get.docker.com | sh sudo usermod -aG docker $(whoami) docker run hello-world

cd $HOME git clone --recursive https://github.com/ccoreilly/vosk-browser cd vosk-browser screen time make builder time make binary

arbdevml avatar Sep 01 '22 08:09 arbdevml

updated files: src/vosk.d.ts

export declare class Model {
  constructor(path: string);
  public delete(): void;
}

export declare class SpkModel {
  constructor(path: string);
  public delete(): void;
}

export declare class KaldiRecognizer {
  constructor(model: Model, sampleRate: number);
  constructor(model: Model, sampleRate: number, grammar: string);
  constructor(model: Model, sampleRate: number, spkModel: SpkModel);
  public SetSpkModel(spkModel: SpkModel): void;
  public SetWords(words: boolean): void;
  public AcceptWaveform(address: number, length: number): boolean;
  public Result(): string;
  public PartialResult(): string;
  public FinalResult(): string;
  public delete(): void;
}
export declare interface Vosk {
  FS: {
    mkdir: (dirName: string) => void;
    mount: (fs: any, opts: any, path: string) => void;
  };
  MEMFS: Record<string, any>;
  IDBFS: Record<string, any>;
  WORKERFS: Record<string, any>;
  HEAPF32: any;
  downloadAndExtract: (url: string, localPath: string) => void;
  syncFilesystem: (fromPersistent: boolean) => void;
  Model;
  KaldiRecognizer;
  SetLogLevel(level: number): void;
  GetLogLevel(): number;
  _malloc: (size: number) => number;
  _free: (buffer: number) => void;
}

export default function LoadVosk(): Promise<Vosk>;

src/bindings.cc

// Copyright 2020 Denis Treskunov
// Copyright 2021 Ciaran O'Reilly

#include <emscripten/bind.h>
#include "utils.h"
#include "../vosk/src/kaldi_recognizer.h"
#include "../vosk/src/model.h"
#include "../vosk/src/spk_model.h"

using namespace emscripten;

namespace emscripten {
    namespace internal {
        template<> void raw_destructor<Model>(Model* ptr) { /* do nothing */ }
        template<> void raw_destructor<SpkModel>(SpkModel* ptr) { /* do nothing */ }
    }
}

struct ArchiveHelperWrapper : public wrapper<ArchiveHelper> {
    EMSCRIPTEN_WRAPPER(ArchiveHelperWrapper);
    void onsuccess() {
        return call<void>("onsuccess");
    }
    void onerror(const std::string &what) {
        return call<void>("onerror", what);
    }
};

static Model *makeModel(const std::string &model_path) {
    try {
        return new Model(model_path.c_str());
    } catch (std::exception &e) {
        KALDI_ERR << "Exception in Model ctor: " << e.what();
        throw;
    }
}

static SpkModel *makeSpkModel(const std::string &model_path) {
    try {
        return new SpkModel(model_path.c_str());
    } catch (std::exception &e) {
        KALDI_ERR << "Exception in SpkModel ctor: " << e.what();
        throw;
    }
}

static KaldiRecognizer* makeRecognizerWithGrammar(Model *model, float sample_frequency, const std::string &grammar) {
    try {
        KALDI_VLOG(2) << "Creating model with grammar";
        return new KaldiRecognizer(model, sample_frequency, grammar.c_str());
    } catch (std::exception &e) {
        KALDI_ERR << "Exception in KaldiRecognizer ctor: " << e.what();
        throw;
    }
}

static KaldiRecognizer* makeRecognizerWithSpk(Model *model, float sample_frequency, SpkModel *spk_model) {
    try {
        KALDI_VLOG(2) << "Creating model with spk";
        return new KaldiRecognizer(model, sample_frequency, spk_model);
    } catch (std::exception &e) {
        KALDI_ERR << "Exception in KaldiRecognizer ctor: " << e.what();
        throw;
    }
}

static void KaldiRecognizer_SetSpkModel(KaldiRecognizer &self, SpkModel *spk_model)
{
    KALDI_VLOG(2) << "Setting SpkModel";
    self.SetSpkModel(spk_model);
}

static void KaldiRecognizer_SetWords(KaldiRecognizer &self, int words) {
    KALDI_VLOG(2) << "Setting words to " << words;
    self.SetWords(words);
}

static bool KaldiRecognizer_AcceptWaveform(KaldiRecognizer &self, long jsHeapAddr, int len) {
    const float *fdata = (const float*) jsHeapAddr;
    KALDI_VLOG(3) << "AcceptWaveform received len=" << len << " 0=" << fdata[0] << " " << len-1 << "=" << fdata[len-1];
    
    return self.KaldiRecognizer::AcceptWaveform(fdata, len);
}

static string KaldiRecognizer_Result(KaldiRecognizer &self) {
    std::string s;
    s += self.KaldiRecognizer::Result();
    
    return s;
}

static string KaldiRecognizer_FinalResult(KaldiRecognizer &self) {
    std::string s;
    s += self.KaldiRecognizer::FinalResult();
    
    return s;
}

static string KaldiRecognizer_PartialResult(KaldiRecognizer &self) {
    std::string s;
    s += self.KaldiRecognizer::PartialResult();
    
    return s;
}

EMSCRIPTEN_BINDINGS(vosk) {
    class_<ArchiveHelper>("ArchiveHelper")
        .function("Extract", &ArchiveHelper::Extract)
        .allow_subclass<ArchiveHelperWrapper>("ArchiveHelperWrapper")
        .function("onsuccess", optional_override([](ArchiveHelper& self) {
            return self.ArchiveHelper::onsuccess();
        }))
        .function("onerror", optional_override([](ArchiveHelper& self, const std::string &what) {
            return self.ArchiveHelper::onerror(what);
        }))
        ;

    class_<Model>("Model")
        .constructor(&makeModel, allow_raw_pointers())
        ;

    class_<SpkModel>("SpkModel")
        .constructor(&makeSpkModel, allow_raw_pointers())
        ;

    class_<KaldiRecognizer>("KaldiRecognizer")
        .constructor(&makeRecognizerWithGrammar, allow_raw_pointers())
        .constructor<Model *, float>(allow_raw_pointers())
        .constructor(&makeRecognizerWithSpk, allow_raw_pointers())
        .constructor<SpkModel *, float>(allow_raw_pointers())
        .function("SetWords", &KaldiRecognizer_SetWords)
        .function("SetSpkModel", &KaldiRecognizer_SetSpkModel)
        .function("AcceptWaveform", &KaldiRecognizer_AcceptWaveform)
        .function("Result", &KaldiRecognizer_Result)
        .function("FinalResult", &KaldiRecognizer_FinalResult)
        .function("PartialResult", &KaldiRecognizer_PartialResult)
        ;
    
    emscripten::function("SetLogLevel", &SetVerboseLevel);
    emscripten::function("GetLogLevel", &GetVerboseLevel);
}

faced with these errors:

  • no matching constructor for initialization of KaldiRecognizer
  • static_assert failed due to requirement '!std::is_pointer<SpkModel *>::value' "Implicitly binding raw pointers is illegal. Specify allow_raw_pointer<arg<?>>"

arbdevml avatar Sep 01 '22 08:09 arbdevml

Very big thank you Ciaran O'Reilly for your answer.

arbdevml avatar Sep 01 '22 08:09 arbdevml

Hi @arbdevml, sorry for my late reply. I'll check your changes. In the future, it'd be easier if you forked the repository and shared your changes in a branch of your fork. That way, it is pretty straightforward to check it out and test.

ccoreilly avatar Sep 09 '22 20:09 ccoreilly