How to implement custom multi-thread cpu kernels?
I read the document https://ml-explore.github.io/mlx/build/html/dev/extensions.html#implementing-the-cpu-back-end, and I need to develop a new primitive in my project. I found that the example custom cpu kernel is a single thread one, how can we implement a multi thread cpu kernel? Is there any example?
As you do mention there is no out-of-the-box multi-threading support currently in the CPU backend. You can deal with this in the following 2 ways:
- Utilize multiple streams and deal with multi-threading at the op level
- Make a static thread pool for your primitive and submit tasks there yourself and the primitive will wait for their completion.
Let us know if you need more help on how to do either.
As you do mention there is no out-of-the-box multi-threading support currently in the CPU backend. You can deal with this in the following 2 ways:
- Utilize multiple streams and deal with multi-threading at the op level
- Make a static thread pool for your primitive and submit tasks there yourself and the primitive will wait for their completion.
Let us know if you need more help on how to do either.
@angeloskath Thanks for your reply! I think a static thread pool will solve my problem. I'm curious about how mlx itself support multi threading cpu kernels. I try to read code at https://github.com/ml-explore/mlx/tree/main/mlx/backend/cpu, but I find that most ops seems run in single thread.