Chris Elrod

Results 832 comments of Chris Elrod
trafficstars

> My working theory is that it may have something to do with certain powers of 2 being magic slow numbers. From [Agner Fog's C++ optimization manual, pages 89-90](https://www.agner.org/optimize/optimizing_cpp.pdf): >...

> A lot of this is from ForwardDiff not caching some compiles. @chriselrod let's take a look at that. As discussed on slack, tagging is likely part of the problem....

What is the API for the cast functions? They don't seem to be documented. Conversions between integers is missing, eg: https://github.com/QuantStack/xsimd/blob/ca75511f497bc8eb178c42db94d3da4d80a8f318/include/xsimd/types/xsimd_avx_conversion.hpp https://github.com/QuantStack/xsimd/blob/ca75511f497bc8eb178c42db94d3da4d80a8f318/include/xsimd/types/xsimd_avx512_conversion.hpp Casting between 64 bit and 32 bit unsigned...

```julia julia> while true; @btime ThreadsX.sort!(xs) setup=(xs=rand(MersenneTwister(@show(seed[] += 1)), 0:0.01:1, 1_000_000)); end ... seed[] += 1 = 34162 seed[] += 1 = 34163 seed[] += 1 = 34164 seed[] +=...

Have you tried to reproduce with ```julia using BenchmarkTools, ThreadsX, Random seed = Ref(0); while true; @btime ThreadsX.sort!(xs) setup=(xs=rand(MersenneTwister(@show(seed[] += 1)), 0:0.01:1, 1_000_000)); end ``` ?

No hang? Interesting. For how long have you been running it? Normally it doesn't take too long for it to hang for me. The last three were after 10, 8,...

Did you try `JULIA_NUM_THREADS=13` with 1.5.0-DEV.464? I wanted to bisect, so I jumped back to 1.5.0-DEV.460, and got the hang there.

That's incredible -- really neat script! I ran `while true; @btime ThreadsX.sort($(rand(0:0.01:1, 1_000_000))); end` for more than an half an hour without a problem. I picked some time on the...

I modified the script to use `ps` instead of `top`, because my OS provides `htop` instead of `top` (i.e., `/usr/bin/top` is actually `htop`), which doesn't have the `-b` option, and...

You're right, I just ran into this problem. It hung, but was declining slowly from >700% (with 8 threads). It wouldn't have gotten to