Chris Elrod
Chris Elrod
> This does not happen on x86. Even with `--check-bounds=no`? There have been similar reports elsewhere about `--check-bounds=no` causing inference problems that I could reproduce on x86, e.g. https://github.com/JuliaSIMD/StrideArrays.jl/issues/78
I suspect there is a bug in ThreadingUtilities' or Polyester's atomics, and they're exposed on apple silicon because it has much better out of order capabilities than anything else.
I think it is a lot more likely to be a bug in Polyester than Julia or LLVM, so I wouldn't be surprised if it shows up again on 1.10.
> When using 30 threads to run the command matmul!(xtx, X', X), I encountered an error. However, when using 20 threads, no error was observed. How many physical cores does...
I suspect the illegal write isn't to `xtx`, but to the temporary packed matrix `X'`. What do you get for ```julia julia> Octavian.first_cache_size(Val(eltype(X))) static(65536) julia> Octavian.first_cache_size(Val(eltype(X))) / length(X) 10.076260762607626 ```...
Do you think you could get an rr trace of the error? That is, start Julia with `--bug-report=rr` and upload the trace?
@ranocha should be able to create releases.
Octavian's blocking strategy/algorithm is outright bad/suboptimal. Anyway, why is it even trying to block? The overall size of the arrays is large, but the reduction dimension is so small, that...
I think it's unlikely that fitting into memory will be a problem. FWIW, I get an extreme value for bandwidth currently when multithreaded, as the array fits in local caches....
It's because of Julia's not-specializing-on-type heuristics. You could make a PR that has the macro call wrap all arguments to `batch`, and then `unwrap` them at the start of the...