piper Generating speech locally in the web browser

It would be awesome if Piper's awesome TTS could generate the audio locally in the browser e.g: on an old phone, but the dependency on ONNX and the eSpeak variant makes this tricky. Streaming audio to and from a server is often fine but generating the audio locally could avoid needing to setup server infrastructure, and once cached could be faster, more private and work offline, without caring about network dead spots. It could be great for browser extensions too.

There is an eSpeak-ng "espeakng.js" demo here: https://www.readbeyond.it/espeakng/ With source here: https://github.com/espeak-ng/espeak-ng/tree/master/emscripten

Obviously it's not quite as magical as Piper but I think it's exciting. I can happily hack stuff together with Python and Docker, but I'm out of my depth with compiling stuff to different architectures, so after having a look, I'm backing off for now, but I thought I'd share what I learned in case others with relevant skills were also interested:

Both eSpeak-ng and ONNX Runtime Web have different ways of being compiled, but it turns out that they both are run in browsers via Emscripten.

For whatever it's worth, someone else has a another way of building a subset here: https://github.com/ianmarmour/espeak-ng.js/tree/main

There are ONNX web runtimes too.

ONNX Runtime Web, shares it's parent projects, really massive Python build helper script, but there is a quite helpful FAQ, that indicates it has a static builds, demonstrated with build info too: https://onnxruntime.ai/docs/build/web.html https://www.npmjs.com/package/onnxruntime-web

Footnote:

I did have a look at container2wasm for this too, but I couldn't quickly figure out how input and output of files would work. As well as looking at how Copy.sh's browser x86 emulator, v86 can use Arch with a successfully running Docker implementation! With v86 there are examples of doing input and output with files but getting everything working for x86 with 32 bit architecture seemed too complicated to me and might be a bit much, compared to compiling with Emscripten properly, even if it would potentially be usable for much more than cheekily running lots of arbitrary things in the browser.

P.S: awesome work @synesthesiam !

Jan 15 '24 14:01 lukestanley

I believe piper can run in the browser using this Looks like a patch is required in piper I wonder if we can get that merged back to this repo so it's easier to build latest

Jan 22 '24 20:01 eschmidbauer

Wow I see they got it working, they have a demo here: https://piper.wide.video Amazing work @jozefchutka!

The file sizes shown in the drop down is not correct, and the UI has lots of options to try out models, perhaps more than needed, but it works!! It even worked on Chrome on Android! It ran fairly fast for me after downloading. When testing with the VCTK voice weights, I got a real-time factor of 0.79 on my PC in Firefox (faster at generating the audio than the length of the audio). Real-time factor was 1.1 on my Android phone in Chrome (a bit slower than the actual audio). If it could start playing as soon as it had "enough" buffer of audio, that would probably be close to real time. I think that's amazing considering it's on device and runs all kinds of places. There are lots of things that could be optimised. This could be made into a great frontend library, possibly a shim, or it might be useful for some specific kinds of webapps or extensions directly, such as TTS extensions or voice chat apps. It won't be as fast on a lot of old devices but it's already close to working well enough for a lot of use cases. Regarding getting the https://github.com/wide-video/piper change into this repo, I expect with a bit of work, a reasonable change might possibly be made. I'm not well versed in C++ but it seems like the exact change made in https://github.com/wide-video/piper/commit/a8e4c8702ef124a438dc96659904da52cc1aba27 would need to be modified, to not break existing expected behaviour, and that's probably best done on top of latest master.

In the "WASM friendly" fork, a new command-line argument "--input" was added . It's used to parse JSON directly from the command line. A new JSON object input is initialised instead of reading from JSON from stdin, parts of the code for parsing JSON line by line are commented out, but parts that deal with the found attributes, remain. I think to cleanly integrate it, a command like argument to input JSON without stdin, is a good idea, and to avoid repeating code, some of the common logic would probably need extracting out. @jozefchutka and @synesthesiam if you could weigh in on that, it'd be appreciated. Anyway, awesome work!

@eschmidbauer I have to wonder, how did you find it?

Jan 26 '24 23:01 lukestanley

It would be great having piper compilable smoothly into wasm. The last time I tried, it took many manual steps to do so. Merging with https://github.com/wide-video/piper/commit/a8e4c8702ef124a438dc96659904da52cc1aba27 is just a tip of the iceberg.

a thread discussing some of the issues: https://github.com/rhasspy/piper-phonemize/issues/16
build process with the manual steps: https://github.com/wide-video/piper-wasm

Feb 01 '24 09:02 jozefchutka

I would like to share the news with you guys that you can run all of the models from piper with web assembly using sherpa-onnx, one of the subprojects of the next-gen Kaldi

We have created a huggingface space so that you can try it. The address is https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-en Screenshot 2024-02-08 at 23 38 02

The above huggingface space uses the following model from piper: https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-libritts_r-medium.tar.bz2

We also have a YouTube video to show you how to do that. https://www.youtube.com/watch?v=IcbbJBf01UI

Everything is open-sourced. If you want to know how web assembly is supported for piper, please see the following pull request: https://github.com/k2-fsa/sherpa-onnx/pull/577

There is one more thing to be improved:

[ ] Play the generated audio as it is still generating. It is feasible since it generates audio sentence by sentence and each sentence is processed independently.

FYI: In addition to running piper models with web assembly using sherpa-onnx, you can also run them on Android, iOS, Raspberry Pi, Linux, Windows, macOS, etc, with sherpa-onnx. All models from piper are supported by sherpa-onnx and you can find the converted models at https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models

Feb 08 '24 15:02 csukuangfj

You can find the files for the above huggingface space at https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-en/tree/main

You can see that the wasm module file is only 11.5 MB.

Feb 08 '24 15:02 csukuangfj

@csukuangfj This is great, thanks so much !

Feb 08 '24 19:02 eschmidbauer

You can find the files for the above huggingface space at https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-en/tree/main

You can see that the wasm module file is only 11.5 MB.

@csukuangfj Superb job! but i wonder is it possible to extract voice model from .data file and load it in wasm worker separately(voice and tokens) during init function in javascript to possiblity to load different voices

Feb 09 '24 09:02 gyroing

You can find the files for the above huggingface space at https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-en/tree/main

You can see that the wasm module file is only 11.5 MB.

@csukuangfj Superb job! but i wonder is it possible to extract voice model from .data file and load it in wasm worker separately(voice and tokens) during init function in javascript to possiblity to load different voices

Sorry that I don't know whether it is possible. I am very new to WebAssembly (only learned it for 3 days)

Feb 09 '24 12:02 csukuangfj

https://piper.ttstool.com

Piper has been integrated into Read Aloud, and released as a separate extension as well.

The source code is here. Please help out if you can with some of the open issues.

Mar 25 '24 14:03 ken107

Following @ken107 work, I have updated https://piper.wide.video/ . Instead of whole piper being compiled into wasm, now it is 2 step process:

piper-phonemize as wasm (build steps) providing phoemeIds...
...consumed directly by onnxruntime

This already provides 4-8x improved performance when running on CPU.

Here is the simplest implementation https://piper.wide.video/poc.html

Apr 03 '24 13:04 jozefchutka

Sharing my Paste-n-Build solution based on @jozefchutka research.

#!/bin/bash
BUILD_DIR=$(pwd)/build-piper

rm -rf $BUILD_DIR && mkdir $BUILD_DIR

TMP=$BUILD_DIR/.tmp
[ ! -d $TMP ] && mkdir $TMP
DOCKERFILE=$TMP/piper_wasm_compile.Dockerfile

cat <<EOF > $DOCKERFILE
FROM debian:stable-slim
RUN apt-get update && \
    apt-get install --yes --no-install-recommends \
    build-essential \
    cmake \
    ca-certificates \
    curl \
    pkg-config \
    git \
    autogen \
    automake \
    autoconf \
    libtool \
    python3 && ln -sf python3 /usr/bin/python
RUN git clone --depth 1 https://github.com/emscripten-core/emsdk.git /modules/emsdk
WORKDIR /modules/emsdk
RUN ./emsdk install 3.1.41 && \
    ./emsdk activate 3.1.41 && \
    rm -rf downloads
WORKDIR /wasm
ENTRYPOINT ["/bin/bash", "-c", "EMSDK_QUIET=1 source /modules/emsdk/emsdk_env.sh  && \"\$@\"", "-s"]
CMD ["/bin/bash"]
EOF

docker buildx build -t piper-wasm-compiler -q -f $DOCKERFILE .

cat <<EOF | docker run --rm -i -v $TMP:/wasm piper-wasm-compiler /bin/bash
[ ! -d espeak-ng ] && git clone --depth 1 https://github.com/rhasspy/espeak-ng.git
cd /wasm/espeak-ng
./autogen.sh
./configure
make

cd /wasm
[ ! -d piper-phonemize ] && git clone --depth 1 https://github.com/wide-video/piper-phonemize.git
cd piper-phonemize && git pull
emmake cmake -Bbuild -DCMAKE_INSTALL_PREFIX=install -DCMAKE_TOOLCHAIN_FILE=\$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake -DBUILD_TESTING=OFF -G "Unix Makefiles" -DCMAKE_CXX_FLAGS="-O3 -s INVOKE_RUN=0 -s MODULARIZE=1 -s EXPORT_NAME='createPiperPhonemize' -s EXPORTED_FUNCTIONS='[_main]' -s EXPORTED_RUNTIME_METHODS='[callMain, FS]' --preload-file /wasm/espeak-ng/espeak-ng-data@/espeak-ng-data"
emmake cmake --build build --config Release # fails on "Compile intonations / Permission denied", continue with next steps
sed -i 's+\$(MAKE) \$(MAKESILENT) -f CMakeFiles/data.dir/build.make CMakeFiles/data.dir/build+#\0+g' /wasm/piper-phonemize/build/e/src/espeak_ng_external-build/CMakeFiles/Makefile2
sed -i 's/using namespace std/\/\/\0/g' /wasm/piper-phonemize/build/e/src/espeak_ng_external/src/speechPlayer/src/speechWaveGenerator.cpp
emmake cmake --build build --config Release
EOF

cp $TMP/piper-phonemize/build/piper_phonemize.* $BUILD_DIR

rm -rf $TMP

This script will automatically build and copy piper_phonemize.data piper_phonemize.wasm piper_phonemize.js into ./build-piper folder.

Under the hood this script will:

Build a smallest docker image. Well, it's 1.5gb instead of 1.9gb
Build piper-phonemize
Create ./build-piper folder and copy wasm artifacts in it.
Clean all temp files.

Apr 06 '24 12:04 iSuslov

It would be awesome if Piper's awesome TTS could generate the audio locally in the browser e.g: on an old phone, but the dependency on ONNX and the eSpeak variant makes this tricky. Streaming audio to and from a server is often fine but generating the audio locally could avoid needing to setup server infrastructure, and once cached could be faster, more private and work offline, without caring about network dead spots. It could be great for browser extensions too.

There is an eSpeak-ng "espeakng.js" demo here: https://www.readbeyond.it/espeakng/ With source here: https://github.com/espeak-ng/espeak-ng/tree/master/emscripten

Obviously it's not quite as magical as Piper but I think it's exciting. I can happily hack stuff together with Python and Docker, but I'm out of my depth with compiling stuff to different architectures, so after having a look, I'm backing off for now, but I thought I'd share what I learned in case others with relevant skills were also interested:

Both eSpeak-ng and ONNX Runtime Web have different ways of being compiled, but it turns out that they both are run in browsers via Emscripten.

For whatever it's worth, someone else has a another way of building a subset here: https://github.com/ianmarmour/espeak-ng.js/tree/main

There are ONNX web runtimes too.

ONNX Runtime Web, shares it's parent projects, really massive Python build helper script, but there is a quite helpful FAQ, that indicates it has a static builds, demonstrated with build info too: https://onnxruntime.ai/docs/build/web.html https://www.npmjs.com/package/onnxruntime-web

Footnote:

I did have a look at container2wasm for this too, but I couldn't quickly figure out how input and output of files would work. As well as looking at how Copy.sh's browser x86 emulator, v86 can use Arch with a successfully running Docker implementation! With v86 there are examples of doing input and output with files but getting everything working for x86 with 32 bit architecture seemed too complicated to me and might be a bit much, compared to compiling with Emscripten properly, even if it would potentially be usable for much more than cheekily running lots of arbitrary things in the browser.

P.S: awesome work @synesthesiam !

https://github.com/HirCoir/HirCoir-Piper-tts-app

Apr 08 '24 17:04 HirCoir

@iSuslov can you provide a simple POC to test your work, i'm near to backend, but I need to implement this on a web with the less dependencies (html+js+wams if possible with no additional frameworks like node.js), i'm little lost wich where to start.

Also @jozefchutka if you can share your source code for your poc will be a good starting point to understand this artifacs.

Best regards!

Apr 25 '24 04:04 puppetm4st3r

but I need to implement this on a web with the less dependencies (html+js+wams if possible with no additional frameworks like node.js)

@puppetm4st3r Do you want to try sherpa-onnx? It does exactly what you wish to: HTML + js +wasm. There's no need for any other dependencies.

Doc: https://k2-fsa.github.io/sherpa/onnx/tts/wasm/index.html

huggingface space demo for wasm + tts: https://k2-fsa.github.io/sherpa/onnx/tts/wasm/index.html

(Hint: You can copy the files from the huggingface space directly to your own project.)

Apr 25 '24 04:04 csukuangfj

thanks! I'm following the doc but when I try to build the assets for an spanish model a got this stack trace.

LLVM ERROR: Broken module found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/dario/src/emsdk/upstream/bin/wasm-ld -o ../../bin/sherpa-onnx-wasm-main-tts.wasm CMakeFiles/sherpa-onnx-wasm-main-tts.dir/sherpa-onnx-wasm-main-tts.cc.o -L/home/dario/src/tts/sherpa-onnx/build-wasm-simd-tts/_deps/onnxruntime-src/lib ../../lib/libsherpa-onnx-c-api.a ../../lib/libsherpa-onnx-core.a ../../lib/libkaldi-native-fbank-core.a ../../lib/libkaldi-decoder-core.a ../../lib/libsherpa-onnx-kaldifst-core.a ../../_deps/onnxruntime-src/lib/libonnxruntime.a ../../lib/libpiper_phonemize.a ../../lib/libespeak-ng.a /home/dario/src/tts/sherpa-onnx/build-wasm-simd-tts/_deps/onnxruntime-src/lib/libonnxruntime.a -L/home/dario/src/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten ../../lib/libucd.a ../../lib/libsherpa-onnx-fstfar.a ../../lib/libsherpa-onnx-fst.a -lGL-getprocaddr -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpwhke9p6slibemscripten_js_symbols.so --strip-debug --export=CopyHeap --export=malloc --export=free --export=MyPrint --export=SherpaOnnxCreateOfflineTts --export=SherpaOnnxDestroyOfflineTts --export=SherpaOnnxDestroyOfflineTtsGeneratedAudio --export=SherpaOnnxOfflineTtsGenerate --export=SherpaOnnxOfflineTtsGenerateWithCallback --export=SherpaOnnxOfflineTtsNumSpeakers --export=SherpaOnnxOfflineTtsSampleRate --export=SherpaOnnxWriteWave --export=_emscripten_stack_alloc --export=__get_temp_ret --export=__set_temp_ret --export=__wasm_call_ctors --export=emscripten_stack_get_current --export=_emscripten_stack_restore --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-table -z stack-size=10485760 --max-memory=2147483648 --initial-memory=536870912 --no-entry --table-base=1 --global-base=1024
 #0 0x0000564a79ff0228 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xf86228)
 #1 0x0000564a79fed65e llvm::sys::RunSignalHandlers() (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xf8365e)
 #2 0x0000564a79ff0e7f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007722bf8f7520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007722bf94b9fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x00007722bf8f7476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x00007722bf8dd7f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x0000564a79f5e4c3 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xef44c3)
 #8 0x0000564a79f5e306 (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xef4306)
 #9 0x0000564a7ad5186e (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1ce786e)
#10 0x0000564a7c873a82 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x3809a82)
#11 0x0000564a7ad4aae1 llvm::lto::opt(llvm::lto::Config const&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*, std::__2::vector<unsigned char, std::__2::allocator<unsigned char>> const&) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1ce0ae1)
#12 0x0000564a7ad4ca42 llvm::lto::backend(llvm::lto::Config const&, std::__2::function<llvm::Expected<std::__2::unique_ptr<llvm::CachedFileStream, std::__2::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1ce2a42)
#13 0x0000564a7ad3c2aa llvm::lto::LTO::runRegularLTO(std::__2::function<llvm::Expected<std::__2::unique_ptr<llvm::CachedFileStream, std::__2::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1cd22aa)
#14 0x0000564a7ad3b5c9 llvm::lto::LTO::run(std::__2::function<llvm::Expected<std::__2::unique_ptr<llvm::CachedFileStream, std::__2::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::__2::function<llvm::Expected<std::__2::function<llvm::Expected<std::__2::unique_ptr<llvm::CachedFileStream, std::__2::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>> (unsigned int, llvm::StringRef, llvm::Twine const&)>) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1cd15c9)
#15 0x0000564a7a3eca96 lld::wasm::BitcodeCompiler::compile() (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1382a96)
#16 0x0000564a7a3eea74 lld::wasm::SymbolTable::compileBitcodeFiles() (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1384a74)
#17 0x0000564a7a3d6135 lld::wasm::(anonymous namespace)::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) Driver.cpp:0:0
#18 0x0000564a7a3d1035 lld::wasm::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0x1367035)
#19 0x0000564a79ff325e lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xf8925e)
#20 0x0000564a79f37481 lld_main(int, char**, llvm::ToolContext const&) (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xecd481)
#21 0x0000564a79f37e64 main (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xecde64)
#22 0x00007722bf8ded90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#23 0x00007722bf8dee40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#24 0x0000564a79eace2a _start (/home/dario/src/emsdk/upstream/bin/wasm-ld+0xe42e2a)
em++: error: '/home/dario/src/emsdk/upstream/bin/wasm-ld -o ../../bin/sherpa-onnx-wasm-main-tts.wasm CMakeFiles/sherpa-onnx-wasm-main-tts.dir/sherpa-onnx-wasm-main-tts.cc.o -L/home/dario/src/tts/sherpa-onnx/build-wasm-simd-tts/_deps/onnxruntime-src/lib ../../lib/libsherpa-onnx-c-api.a ../../lib/libsherpa-onnx-core.a ../../lib/libkaldi-native-fbank-core.a ../../lib/libkaldi-decoder-core.a ../../lib/libsherpa-onnx-kaldifst-core.a ../../_deps/onnxruntime-src/lib/libonnxruntime.a ../../lib/libpiper_phonemize.a ../../lib/libespeak-ng.a /home/dario/src/tts/sherpa-onnx/build-wasm-simd-tts/_deps/onnxruntime-src/lib/libonnxruntime.a -L/home/dario/src/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten ../../lib/libucd.a ../../lib/libsherpa-onnx-fstfar.a ../../lib/libsherpa-onnx-fst.a -lGL-getprocaddr -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpwhke9p6slibemscripten_js_symbols.so --strip-debug --export=CopyHeap --export=malloc --export=free --export=MyPrint --export=SherpaOnnxCreateOfflineTts --export=SherpaOnnxDestroyOfflineTts --export=SherpaOnnxDestroyOfflineTtsGeneratedAudio --export=SherpaOnnxOfflineTtsGenerate --export=SherpaOnnxOfflineTtsGenerateWithCallback --export=SherpaOnnxOfflineTtsNumSpeakers --export=SherpaOnnxOfflineTtsSampleRate --export=SherpaOnnxWriteWave --export=_emscripten_stack_alloc --export=__get_temp_ret --export=__set_temp_ret --export=__wasm_call_ctors --export=emscripten_stack_get_current --export=_emscripten_stack_restore --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-table -z stack-size=10485760 --max-memory=2147483648 --initial-memory=536870912 --no-entry --table-base=1 --global-base=1024' failed (received SIGABRT (-6))
make[2]: *** [wasm/tts/CMakeFiles/sherpa-onnx-wasm-main-tts.dir/build.make:111: bin/sherpa-onnx-wasm-main-tts.js] Error 1
make[1]: *** [CMakeFiles/Makefile2:1281: wasm/tts/CMakeFiles/sherpa-onnx-wasm-main-tts.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

Installed cmake with apt-get, and then with pip. both got me that stack trace... Do you know what it could be happened?

the model selected was: https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-es_MX-claude-high.tar.bz2 Best regards!

Apr 25 '24 05:04 puppetm4st3r

How much RAM does your computer have?

Could you try make -j1?

@puppetm4st3r

Apr 25 '24 06:04 csukuangfj

64gb, free at least 90%, will try!

Apr 25 '24 06:04 puppetm4st3r

64gb, free at least 90%, will try!

Does it work now?

Apr 25 '24 07:04 csukuangfj

yes! thanks!

Apr 25 '24 08:04 puppetm4st3r

@iSuslov can you provide a simple POC to test your work, i'm near to backend, but I need to implement this on a web with the less dependencies (html+js+wams if possible with no additional frameworks like node.js), i'm little lost wich where to start.

Hey @puppetm4st3r, I see your issue is resolved, but in case my script seems to be confusing I would like to clarify:

Open bash terminal. Go to any folder.
Copy-paste script.

Script will download and compile everything it needs producing wasm build in the same folder. Docker should be preinstalled.

Apr 26 '24 03:04 iSuslov

thanks, now i got another issue, when I compile with your script @iSuslov it works like a charm on desktop webbrowser, but did not work on iOs with OOM error. but when tryed the other solution from @csukuangfj it work on iphone, but cant get running for spanish models wih @csukuangfj method. I'm stuck :(

Apr 26 '24 03:04 puppetm4st3r

but cant get running for spanish models wih @csukuangfj method.

Could you describe in detail about why you cannot run it?

Apr 26 '24 04:04 csukuangfj

when I tried your advice it finally didn't work, it was a false positive my mistake, cache wont refresh and I was testing with the solution from @iSuslov, it still send me the stack trace that I attached here. But if I clone your sample code with the model in English it works (with no building process, just the sample code with wasm binaries). Tryed to compile inside a clean docker and outside the docker on my machine, both didnt work.

Script from @iSuslov works, but when I tryid on iOS crash with OOM, your sample from HF space works on iOS without problem.

Apr 26 '24 04:04 puppetm4st3r

Tryed to compile inside a clean docker and outside the docker on my machine, both didnt work

Would be great if you can post error logs. Otherwise, we don't know what you mean when you say didn't work. @puppetm4st3r

Apr 26 '24 05:04 csukuangfj

@puppetm4st3r just out of curiosity, when you say you testing it in iOS do you mean you test it in Safari on iPhone? I've never faced any OOM issues with wasm. Maybe there is an issue in how this script is loaded.

Apr 26 '24 06:04 iSuslov

I tried on iOS iphone safari/chrome but I realized that it is not the wasm for some very strange reason if I test my device using my private network address 192.168.x. and everything works fine I just discovered it, however when accessing the same device via router by public IP fails with an OOM error which makes no sense, I will remotely debug the iPhone and bring you the logs and evidence to leave the case documented in case it is of use to someone. I hope I can solve it now that I know that apparently it is an infrastructure problem...

Apr 26 '24 14:04 puppetm4st3r

@csukuangfj will post later the logs (are very long) maybe I will upload to drive or something...

Apr 26 '24 14:04 puppetm4st3r

@iSuslov additionally I have tested on Android and works fine, the problem is with iOS when exposing the service thru the cloud, so I think is a problem with the infra. But still cant build with the guide from @csukuangfj (have pending to attach the logs of the build process)

Apr 26 '24 21:04 puppetm4st3r

For those who are looking for a ready-to-use solution, I have compiled all the knowledge shared in this thread into this library: https://github.com/diffusion-studio/vits-web .

Thanks to everyone here for the awesome solutions and code snippets!

Jul 06 '24 21:07 k9p5

Re "in the web browser" is tricky because we have to find someway to load these voice files for each time the voice is used, on each origin the voice is used.

There is Native Messaging where we can run/control/communicate to and from native applications from the browser.

This native-messaging-espeak-ng is one variation of what I've been doing with eSpeak-NG for years now, mainly because I wanted to support SSML input (see SSMLParser), which I don't see mentioned here at all.

What this (using Native Messaging) means is that we don't have to compile anything to WASM. We can use piper as-is, send inoput to piper and send the output to the browser.

Jul 14 '24 14:07 guest271314