KissThreading.jl
KissThreading.jl copied to clipboard
Add handling batches to tmap!
In a similar way how we do it in tmapreduce.
If you put an "up for grabs" tag on these issues, I can work on fixing them in my free time. I have interest in getting this package registered as fast as possible. If no tag is there, I will assume you are actively working on them and will not duplicate the effort.
I have sent you an invite to the repo so that we can move it forward faster. Let us agree that you push "obvious" fixes directly and other (e.g. breaking or complex) via PR so that I can review it before merging.
I have just pushed to master a commit that adds batching to tmap!, but I am not 100% happy with it, as it does not always help. We need more benchmarking and guidance in documentation what is the optimal batch size as a function of job size (if job is very large then batches should be smaller in general.).
In the future you can assume that if I put something as an issue it is OK, if you work on it if you want (without "up for grabs" label). Just then please let me know in this issue that you are willing to 😄. Thank you for all your support. I will add several issues I think that are TO-DO.
Thanks for that. I will do what I can.
I was just about to try writing my own version of tmap! with batches, but it looks like there is one in the Master now. Assuming it works, I'm happy to try it out and report back on what I find about optimal batch sizes. Though, I suspect that this varies substantially by case, depending on the variance of task complexity as well as whether or not one is using hyperthreading. (The comparative slowness of virtual cores is why I'm thinking of switching to tmap! from simple Threads.@threads for loops.)
Thanks! Yes - we feel with @mohamed82008 that the code is ready for testing.
For lightweight tasks it seems that even a small batch size is doing very well. In my tests, a batch size of 1 was doing relatively well only <10% slower than the @threads macro used directly on the loop. This is already surprising to me as I would have imagined updating the atomic so often would overhead the function like crazy but it seems not. Another surprising bit is when I used the default batch function. This is supposed to give us a few batches for every thread to have a piece of the cake. But on the contrary, there was 0 speedup with this batch size. This cannot be about the overhead of starting the other threads though because if it were, then there would be no speedup with the other batch sizes too. But a batch size of 1 gave a speedup. So something weird is going on.
@mohamed82008 , can you post your testing code? It would give me a good base to start with in testing the function on other cases.
It's the code in these tests. You can try using different batch_size = ... keyword arguments and check the timings.
https://github.com/bkamins/KissThreading.jl/blob/947961c17854275d5658c8b60bb44cc160e7a858/test/bootstrap.jl#L7
https://github.com/bkamins/KissThreading.jl/blob/947961c17854275d5658c8b60bb44cc160e7a858/test/bubblesort.jl#L8
https://github.com/bkamins/KissThreading.jl/blob/947961c17854275d5658c8b60bb44cc160e7a858/test/sort_batch.jl#L7
Thanks. I was just getting around to this, but it appears that KissThreading requires v0.7. (At least, it seems to use Random, which doesn't appear to exist for v0.6.3.) Am I doing something wrong?
Yes, KissThreading requires Julia 0.7. Given that we are several days from the release I hope you will be able to test it soon.
@UserQuestions please note that KissThreading should be supporting Julia v0.6 now.