simdutf icon indicating copy to clipboard operation
simdutf copied to clipboard

add multithreading for large inputs?

Open ronag opened this issue 11 months ago • 3 comments

Would be cool to have built in support for something like tbb's parallel_for for large inputs, similar to how use use scalar vs SIMD depending on size.

ronag avatar Jan 07 '25 16:01 ronag

@ronag How large are you thinking about?

It takes thousands on nanoseconds to start a thread. Up to, say, 200,000 ns on some systems (it varies greatly). And you haven't done anything yet, you have just started the thread. And there is overhead still to joint the thread once it is done.

If you have, say, a gigabyte of data, then you can go faster... if you have kilobytes, that's doubtful in my opinion. In the megabyte range, it is an open question (and depends on the host system).

Note that I am not dismissing the issue nor being argumentative.

lemire avatar Jan 07 '25 17:01 lemire

I think you are missing the point. Frameworks like tbb implement an efficient thread pool and remove much of the overhead with context switching, thread creation, join etc... Would need some tests ofc to see at which sizes it makes sense. But I've even used it from memcpy many years ago in latency sensitive applications. I'm thinking this might make sense if the overhead is negligible at sizes of ~128k.

ronag avatar Jan 07 '25 17:01 ronag

@ronag

I'm thinking this might make sense if the overhead is negligible at sizes of ~128k.

I agree and it answers my question (How large are you thinking about?). It would be interesting to run experiments.

lemire avatar Jan 07 '25 18:01 lemire