kaldi icon indicating copy to clipboard operation
kaldi copied to clipboard

Support building Kaldi to WASM with OpenBLAS

Open msqr1 opened this issue 1 year ago • 12 comments

Kaldi with OpenBLAS 0.3.28 with some mini hacks and performance increased by 20% (#4952)

msqr1 avatar Oct 15 '24 15:10 msqr1

@jtrmal PTAL at the changes as well as the guide itself here: https://github.com/msqr1/kaldi-wasm2. I also have to force the number threads spawned by Kaldi to be 1 because WASM is quite complicated with multiple threads (we can support that later). I know g_num_threads control this, but is there any other place where kaldi spawn threads?

Thanks!

msqr1 avatar Oct 16 '24 05:10 msqr1

Hi, thank you, will try to get to this this week y.

On Wed, Oct 16, 2024 at 7:39 AM Rylex @.***> wrote:

@jtrmal https://github.com/jtrmal PTAL at the changes as well as the guide itself here: https://github.com/msqr1/kaldi-wasm2.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4954#issuecomment-2415779148, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYX2XXMXCPSAVDJKVCNTZ3X3XTAVCNFSM6AAAAABP7NIIG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVG43TSMJUHA . You are receiving this because you were mentioned.Message ID: @.***>

jtrmal avatar Oct 16 '24 07:10 jtrmal

Could you answer my question so I can work on it?

msqr1 avatar Oct 16 '24 17:10 msqr1

I'm sorry I don't see any question. Y.

On Wed, Oct 16, 2024 at 19:07 Rylex @.***> wrote:

Could you answer my question so I can work on it?

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4954#issuecomment-2417425091, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYXY3SMLR6LAJP3ZQYF3Z32MM3AVCNFSM6AAAAABP7NIIG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXGQZDKMBZGE . You are receiving this because you were mentioned.Message ID: @.***>

jtrmal avatar Oct 16 '24 18:10 jtrmal

Is there any other place where kaldi spawn threads other than the controlled g_num_threads in kaldi_thread.cc?

msqr1 avatar Oct 16 '24 18:10 msqr1

I think just from libraries e.g. some math libraries, like MKL, spawn their own threads. (This is usually not helpful and should be disabled by appopriate environment variables or liberary versions)

danpovey avatar Oct 17 '24 03:10 danpovey

OK, so I can force kaldi to spawn 1 thread by setting g_num_threads to 1. I will have to force all creations of std::thread to be 1 when building to WASM (except the CUDA ones), right?

Btw, what is the difference between g_num_threads =1 vs =0? @danpovey

msqr1 avatar Oct 17 '24 04:10 msqr1

The vast majority of Kaldi programs only use one thread anyway so you probably don't have to do anything in most cases.

danpovey avatar Oct 17 '24 08:10 danpovey

By the way, sherpa-onnx also uses a single thread in its WebAssembly ASR and TTS APPs. And the speed also looks OK, e.g., it is able to do real-time speech recongition.

csukuangfj avatar Oct 17 '24 09:10 csukuangfj

Thanks! I will TAL at that later. For now, I'm just fixing the threading issue to get this donr!

msqr1 avatar Oct 17 '24 17:10 msqr1

I wouldn't attempt to complie the entirety of Kaldi to WASM because the binary size would be enormous. There are lots of templates and many libraries. I'd compile a single binary at a time. IDK much about how WASM works though and how the linking etc. is done.

danpovey avatar Oct 18 '24 02:10 danpovey

Would you PTAL again @danpovey, @jtrmal?

msqr1 avatar Oct 20 '24 03:10 msqr1

would you be willing to make tests runnable? at least the ones from directories you need to compile online2? I'm not sure how much work that would be, ideally running in a console without needing a browser, but running them in a browser would be also fine. y.

On Sun, Oct 20, 2024 at 5:07 AM Rylex @.***> wrote:

Would you PTAL again @danpovey https://github.com/danpovey, @jtrmal https://github.com/jtrmal?

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4954#issuecomment-2424482127, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYXYF564XERL42YMOYRTZ4MM6HAVCNFSM6AAAAABP7NIIG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGQ4DEMJSG4 . You are receiving this because you were mentioned.Message ID: @.***>

jtrmal avatar Oct 21 '24 08:10 jtrmal

ideally running in a console without needing a browser

Yes, I think that is possible.

You can also run wasm with NodeJS in a console. We have been doing this with sherpa-onnx and we even provide an npm package with wasm.

csukuangfj avatar Oct 21 '24 10:10 csukuangfj

I'll try to do that

msqr1 avatar Oct 21 '24 17:10 msqr1

@msqr1 I have a comment regarding your guide at https://github.com/msqr1/kaldi-wasm2/tree/main -- In the line

CC=emcc HOSTCC=clang-20 TARGET=RISCV64_GENERIC USE_THREAD=0 NO_SHARED=1 BINARY=32 BUILD_SINGLE=1 BUILD_DOUBLE=1 BUILD_BFLOAT16=0 BUILD_COMPLEX16=0 BUILD_COMPLEX=0 CFLAGS='-fno-exceptions -fno-rtti' make -j$(nproc)

Is the clang-20 necessary? AFAIK thats still WIP unreleased version from git and as such it will be a lot of hassle for your users to get it. Also I was a bit surprised by the TARGET being riscV -- is that correct? Is WASM compatible with RISCV?

I tried clang-18 and it failed in ubuntu24.04 in docker on Apple M3

29.07 gfortran -O3 -Wall -frecursive -fno-optimize-sibling-calls  -fno-tree-vectorize  -o sblat1 sblat1.o ../libopenblas_riscv64_generic-r0.3.28.a -lgfortran -lgfortran -L/opt/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten  -lGL-getprocaddr -lal -lstubs-debug -lnoexit -lc-debug -ldlmalloc -lc++-noexcept -lc++abi-debug-noexcept -lsockets
29.08 /usr/bin/ld: /opt/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a: error adding symbols: file format not recognized
29.08 collect2: error: ld returned 1 exit status
29.08 make[1]: *** [Makefile:320: sblat1] Error 1
29.08 make[1]: *** Waiting for unfinished jobs....
29.84 make[1]: Leaving directory '/opt/openblas/test'
29.84 make: *** [Makefile:171: tests] Error 2

but that might be just openblas issue...

jtrmal avatar Oct 22 '24 12:10 jtrmal

I was able to compile openblas using

CC=emcc HOSTCC=gcc TARGET=RISCV64_GENERIC USE_THREAD=0 NO_SHARED=1 NOFORTRAN=1 BINARY=64 BUILD_SINGLE=1 BUILD_DOUBLE=1 BUILD_BFLOAT16=0 BUILD_COMPLEX16=0 BUILD_COMPLEX=0 CFLAGS='-fno-exceptions -fno-rtti' make -j$(nproc)
  • setting NOFORTRAN=1 was important (actually not sure if kaldi will compile)
  • why BINARY=32 in your original setup? 4G of memory might not be enough for bigger models

jtrmal avatar Oct 22 '24 14:10 jtrmal

@jtrmal

In the line

CC=emcc HOSTCC=clang-20 TARGET=RISCV64_GENERIC USE_THREAD=0 NO_SHARED=1 BINARY=32 BUILD_SINGLE=1 BUILD_DOUBLE=1 BUILD_BFLOAT16=0 BUILD_COMPLEX16=0 BUILD_COMPLEX=0 CFLAGS='-fno-exceptions -fno-rtti' make -j$(nproc)

Is the clang-20 necessary? AFAIK thats still WIP unreleased version from git and as such it will be a lot of hassle for your users to get it.

Oh my bad, cross compiling OpenBLAS require the native compiler to be passed in, which, in my case, is clang-20. The default is gcc. Ideally though, we would want emcc to do everything here without the need for the native compiler, but I still haven't figured out how to do this yet. I will try again with emcc or gcc

Also I was a bit surprised by the TARGET being riscV -- is that correct? Is WASM compatible with RISCV?

OpenBLAS was written tailored for the machine because it's using that machine's assembly, as you can see the .S files from each target. For compiling to WASM, we will have you a target that uses pure C files, which RISCV64_GENRIC seems to be the only one. See OpenMathLib/OpenBLAS#3640

why BINARY=32 in your original setup? 4G of memory might not be enough for bigger models

WASM on browsers barely have any supports memory being larger than 4G by default. WASM64 (64-bit ptr size is the only difference) is not standardized yet. Emscripten still marks it as experimental. Besides, I don't think we should be running super heavy models on a browser.

msqr1 avatar Oct 22 '24 15:10 msqr1

Ah, thanks for explanation, that makes sense. Y.

On Tue, Oct 22, 2024 at 17:56 Rylex @.***> wrote:

@jtrmal https://github.com/jtrmal

In the line

CC=emcc HOSTCC=clang-20 TARGET=RISCV64_GENERIC USE_THREAD=0 NO_SHARED=1 BINARY=32 BUILD_SINGLE=1 BUILD_DOUBLE=1 BUILD_BFLOAT16=0 BUILD_COMPLEX16=0 BUILD_COMPLEX=0 CFLAGS='-fno-exceptions -fno-rtti' make -j$(nproc)

Is the clang-20 necessary? AFAIK thats still WIP unreleased version from git and as such it will be a lot of hassle for your users to get it.

Oh my bad, cross compiling OpenBLAS require the native compiler to be passed in, which, in my case, is clang-20. The default is gcc. Ideally though, we would want emcc to do everything here without the need for the native compiler, but I still haven't figured out how to do this yet. I will try again with emcc or gcc

Also I was a bit surprised by the TARGET being riscV -- is that correct? Is WASM compatible with RISCV?

OpenBLAS was written tailored for the machine because it's using that machine's assembly, as you can see the .S files from each target. For compiling to WASM, we will have you a target that uses pure C files, which RISCV64_GENRIC seems to be the only one. See OpenMathLib/OpenBLAS#3640 https://github.com/OpenMathLib/OpenBLAS/issues/3640

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4954#issuecomment-2429663541, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYXZABPFTN7BJMRXKIIDZ4ZYS3AVCNFSM6AAAAABP7NIIG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZGY3DGNJUGE . You are receiving this because you were mentioned.Message ID: @.***>

jtrmal avatar Oct 22 '24 16:10 jtrmal

I edited to use gcc by default. Also. For the test, I don't know if it should belong to this PR. I think this one should just focus on adding support for WASM. In my opinion, a repo-wide test for WASM should belong to a separate PR. @jtrmal

msqr1 avatar Oct 28 '24 04:10 msqr1

I suggest that you write a simple test to cover your changes. No need for repo-wide test in this PR.

csukuangfj avatar Oct 28 '24 05:10 csukuangfj

So I should write a test for the math routines? In kaldi source I really only changed the configure, the README, the makefile for OpenBLAS, and the number of threads.

msqr1 avatar Oct 28 '24 14:10 msqr1

You don't have to write anything -- perhaps just compile and run one from the *-test.cc in directory nnet2? Maybe nnet-example-functions-test.cc but really up to you? nnet2 is a directory online2 depends on, so the compiler is already passing through. There is a target "test" but you can also add your own target (wasm-test?) y.

On Mon, Oct 28, 2024 at 3:51 PM Rylex @.***> wrote:

So I should write a test for the math routines? In kaldi source I really only changed the configure and the readme.

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4954#issuecomment-2441809983, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYX26PPVQGF2A2CMPYSDZ5ZFNTAVCNFSM6AAAAABP7NIIG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBRHAYDSOJYGM . You are receiving this because you were mentioned.Message ID: @.***>

jtrmal avatar Oct 29 '24 08:10 jtrmal

Feel free to hardcode everything into the "wasm-text" target if you choose to go that route -- the purpose is not to bog you down but essentially have some example for other folks later how to run other tests they might need. y.

On Tue, Oct 29, 2024 at 9:21 AM Jan Yenda Trmal @.***> wrote:

You don't have to write anything -- perhaps just compile and run one from the *-test.cc in directory nnet2? Maybe nnet-example-functions-test.cc but really up to you? nnet2 is a directory online2 depends on, so the compiler is already passing through. There is a target "test" but you can also add your own target (wasm-test?) y.

On Mon, Oct 28, 2024 at 3:51 PM Rylex @.***> wrote:

So I should write a test for the math routines? In kaldi source I really only changed the configure and the readme.

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4954#issuecomment-2441809983, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYX26PPVQGF2A2CMPYSDZ5ZFNTAVCNFSM6AAAAABP7NIIG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBRHAYDSOJYGM . You are receiving this because you were mentioned.Message ID: @.***>

jtrmal avatar Oct 29 '24 08:10 jtrmal

I'm sorry this took a long time, I was very busy. Anyways, I ran an example test on my guide (nnet2/am-nnet-test.cc), PTAL once more @jtrmal

msqr1 avatar Nov 12 '24 18:11 msqr1

No worries about the time. I'll try to look this week, putting this on my todo.

jtrmal avatar Nov 13 '24 06:11 jtrmal

What's our progress here? @jtrmal

msqr1 avatar Nov 27 '24 01:11 msqr1

@jtrmal

msqr1 avatar Dec 16 '24 18:12 msqr1

@danpovey can you help me land this?

msqr1 avatar Jan 02 '25 03:01 msqr1

There are no CI tests for this PR. Shall we add at least one test?

Changes should be covered by CI tests, I suggest.

csukuangfj avatar Jan 02 '25 07:01 csukuangfj