Chris Elrod

Results 837 comments of Chris Elrod

> However, my implementation of `CartesianSpace` iterator somehow make single thread performance much worse... as a result the overall performance is still not great `CartesianIndices` itself has that problem, where...

This is what `ThreadingUtilites.sleep_all_tasks()` is for, but it does not seem to help much. ```julia julia> using Polyester, LinearAlgebra, TimerOutputs julia> BLAS.set_num_threads(1) julia> function testmul_thread(A,B,C) Threads.@threads for i in 1:96...

> I wonder why your result of first time run of `testmul_batch` is only `8.41ms`, which is very different to `testmul_thread`'s `52.2ms`, but the time of two function is similar...

> > so that we get 2 iterations/thread > > May be not 2 iterations/thread? Because the `for` iterations of function `testmul_thread` and `testmul_batch` are `96`, so each thread will...

Yeah, this isn't great. The M1 is popular and ARM is probably going to only get more popular in the future. I think this should be less frequent in Polyester...

Thanks, I'll go ahead and merge this, and then try it out!

By crashes in serial execution, do you mean throws an error? Normally I think of crashes as "julia exits". The fix here is probably to have [ThreadingUtilities.wait](https://github.com/JuliaSIMD/ThreadingUtilities.jl/blob/85a1ade890c3afeca93d96265493d9c7da9faaaa/src/threadtasks.jl#L56) check if the...

I think the API will be that `wait` prints any errors, resets the tasks automatically, and then returns an error code. The caller can then decide what to do, e.g....

On an Intel laptop: ```julia julia> @benchmark with_batch() BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 17.436 μs … 73.138 ms ┊ GC (min … max): 0.00% …...

One workaround is to set a `minbatch` size: ```julia julia> function with_minbatch() # Just some loop with @batch with basically no runtime @batch minbatch=100 for i in 1:2 nothing end...