Seyoon Ko

Results 36 comments of Seyoon Ko

Related comments in: https://github.com/JuliaParallel/MPI.jl/pull/266 I considered this when I contributed code for CUDA-aware MPI, not knowing about the default global synchronization in CUDA.jl. One of the core contributors for this...

Thank you for your suggestion. I'm trying to follow it, but I might need some help. I will add a comment with what I'm trying soon.

The function `gemv_tile!(c, A, b)` doesn't seem to work with Julia 1.4.1, versions [bdcacae8] LoopVectorization v0.8.19 [3d5dd08c] VectorizationBase v0.12.25 With the setup above, I get StackOverflowError: ``` StackOverflowError: Stacktrace: [1]...

There already is a padded data structure. How do I get rid of all the checks?

```julia using LoopVectorization function _snparray_ax_additive!(out, s::Matrix{UInt8}, v) fill!(out, zero(UInt8)) k = size(s, 1) @avx for j ∈ eachindex(v) for l in 1:k block = s[(j-1)*size(s, 1) + (l-1) + 1]...

It works correctly without the intermediate variable `i`: ```julia function _snparray_ax_additive!(out, s::Matrix{UInt8}, v) fill!(out, zero(UInt8)) k = size(s, 1) @avx for j ∈ eachindex(v) for l in 1:k block =...

Thanks for the information!

Sorry, I messed up reproducing the issue with simpler code. Actually, `g` is a 2-bit packed array: ```julia using BenchmarkTools, LoopVectorization @inline function f(g::AbstractArray{UInt8}, q::AbstractArray{T}, f) where T oneT =...

Looping with direct elementwise access (e.g. https://github.com/kul-forbes/ProximalOperators.jl/blob/master/src/functions/normL2.jl#L37) is expected to be very slow on GPUs, and `prox_naive` would be much more efficient. Adding `prox_naive!` and defining `prox` and `prox!` for...

Thank you for the comment. I have to point out that this is not one specific case: I could quickly find many of them under the directory `functions/`. I will...