Raivis Dejus
Raivis Dejus
Adding Latvian cleaners to filter out sentences with broken encoding.
Adding validation for sentences with question mark before lower case character. This PR extends comments mentioned in https://github.com/common-voice/CorporaCreator/pull/126 Current code in this PR does not do any special validations for...
Adding rules for Latvian. This PR is to not loose the rules created. Please DO NOT process Latvian any further. A subset of Latvian wiki sentences has been added to...
Adjusted the instructions in the Readme to use a more recent version of wikiextractor. It seems to be able to extract more content. In my tests for the Latvian, I...
Adding another replacement option, that can process regexs. This can be used to split longer sentences into smaller chunks. In my tests for Latvian, this can yield ~25% more sentences...
Installed `buzz` according to the official instructions ``` sudo apt-get install libportaudio2 sudo snap install buzz ``` And tried to add transcription with custom huggingface model ``` buzz add --model-type...
Sometimes transcription in Latvian failed with error `Failed utf-8 codec can't decode byte 0xc4 in position 0: unexpected end of data`. This seems to be referenced in https://github.com/ggerganov/whisper.cpp/issues/1798 where multi-byte...
A decoder for [/Digital_Matter/Yabby_LoRaWAN/decoder.js](../blob/master/Digital_Matter/Yabby_LoRaWAN/decoder.js) does not work. The payload received from the Yabby Edge devices is some Base64 encoded string. The current decoder function implementation in the console can't process...
Running transcription with whisper large model crashes with error > Create ONNX inference session for model 'large'.. 2024-01-13 22:41:57.360987848 [E:onnxruntime:, inference_session.cc:1798 operator()] Exception during initialization: /onnxruntime_src/onnxruntime/core/optimizer/initializer.cc:43 onnxruntime::Initializer::Initializer(const onnx::TensorProto&, const onnxruntime::Path&)...
Fix for https://github.com/rhasspy/piper-phonemize/issues/6