tract icon indicating copy to clipboard operation
tract copied to clipboard

Slow sentence encoding models

Open kali opened this issue 2 years ago • 10 comments

Discussed in https://github.com/sonos/tract/discussions/1020

Originally posted by ivanstepanovftw March 27, 2023 I have exported sentence-transformers/all-MiniLM-L6-v2 (~ 60 MiB), and encoding is very slow. Encoding sentence takes 3 seconds bare metal and 10 seconds WASM, while pytorch takes about 0.03 seconds.

Bare metal results:

model loaded in 308 ms
model run in 3929 ms
outputs[0]: 1,128,384,F32 0.22568491, 0.23463942, 0.055242293, 0.15074176, -0.23616529, -0.10899622, 0.099527195, -0.26645622, -0.041477088, 0.023107085, 0.13349931, 0.037163362...

Source code: sentence-transformers.zip

kali avatar Mar 28 '23 18:03 kali

Latest fixes on main seams to fix the problem. I now see:

model loaded in 172 ms
model run in 43 ms
outputs[0]: 1,128,384,F32 0.22568443, 0.2346393, 0.055243187, 0.15074156, -0.2361654, -0.10899601, 0.09952735, -0.26645604, -0.04147695, 0.023107084, 0.13349901, 0.037163444...

kali avatar Apr 06 '23 13:04 kali

@kali Thanks for the work on this project. I'm not seeing these same speeds with exactly this same code in your zip file. I'm on a Apple M2 Max and it is taking around 500 ms in order to run inference in release.

The only changes I made were in the cargo file so it could compile standalone:

[dependencies]
ndarray = "0.16.1"
tokenizers = { version = "0.21.0", default-features = false, features = ["unstable_wasm"] }
tract-onnx = { version = "0.19.8" }
getrandom = { version = "0.2.15", features = ["js"] }

Do you have any suggestions in order to improve the performance?

With WebAssembly I am seeing this on Firefox:

Model loaded successfully 250 ms [hello_wasm_bg.js:289](webpack:///pkg/hello_wasm_bg.js)
model run in 758 ms

and this on Chrome

Model loaded successfully 295.69999998807907 ms
hello_wasm_bg.js:289 model run in 1149.300000011921 ms

andrenatal avatar Jan 04 '25 08:01 andrenatal

Hello, sorry for this regression. Is this wasm specific, or is it impacting your native runtime on the M2 max as well ?

kali avatar Jan 04 '25 08:01 kali

Hi @kali , both native as wasm on firefox as chrome.

andrenatal avatar Jan 04 '25 08:01 andrenatal

This is the output from native:

model loaded in 184 ms
model run in 544 ms
outputs[0]: 1,128,384,F32 0.2256844, 0.23463915, 0.055242732, 0.15074109, -0.23616523, -0.108996294, 0.09952743, -0.26645616, -0.041477047, 0.023107152, 0.13349889, 0.03716349...

andrenatal avatar Jan 04 '25 08:01 andrenatal

Thanks, I will try to have a look next week. I assume it comes from the matmul kit abstraction introduction. In the background, we are also thinking about introducing performance measurement checks in the CI, these performance regressions are happening a bit too often for my taste. But it's non trivial as performance on VM can be pretty erratic and gha does not provides "real" mac machines.

kali avatar Jan 04 '25 08:01 kali

Yeah, I totally agree. These gh action runners are all over the place and not ideal to measure performance. I once needed to measure performance of an inference stack in webassembly here and it wasn't deterministic at all let alone it would fail from time to time for no reason.

Let me know how I can help. I'd be happy to.

andrenatal avatar Jan 04 '25 09:01 andrenatal

Hello. I think there is some kind of confusion here somewhere. I observe good performance with both tract current top of tree and version 0.21.8 . In your previous comment, I can see you left tract-onnx version in Cargo.toml to 0.19.8. Can you try to bump it to 0.21 and see if the problem remains ?

kali avatar Jan 06 '25 12:01 kali

Sure, I will try today. The reason was because I was encountering the referenced issue with 0.21.8 with the exact same codebase.

andrenatal avatar Jan 06 '25 18:01 andrenatal

Hi @kali I confirm that https://github.com/sonos/tract/issues/1612#issuecomment-2574039060 was the issue here in regards to native.

But in regards to Wasm, are you also seeing the same numbers I posted above? Are those the expected?

Firefox:

Model loaded successfully 250 ms [hello_wasm_bg.js:289](webpack:///pkg/hello_wasm_bg.js)
model run in 758 ms

Chrome

Model loaded successfully 295.69999998807907 ms hello_wasm_bg.js:289 
model run in 1149.300000011921 ms

andrenatal avatar Jan 06 '25 22:01 andrenatal