Bernat Font
Bernat Font
Made a test to compare a custom dot product using `mapreduce` vs `LinearAlgebra.⋅`. It performs well on GPU, but the allocation size scales with array size `N` on CPU. This...
For the CPU arrays, LinearAlgebra has a simple method of `dot`: https://github.com/JuliaLang/julia/blob/4e1fd72042133f0dfdcc3f4c3ce5fa74f76bb9c5/stdlib/LinearAlgebra/src/generic.jl#L957 But for the GPU arrays, they just offer an interface to CUBLAS. Last resort, we could implement a...
This does not work on GPU for me, rightfully complains about indexing.
I suggest including the following in `utils.jl` ```julia function ⋅(a, b) @assert size(a) == size(b) "`size(a)` and `size(b)` are not matching." s = zero(promote_type(Float64,eltype(a))) @simd for i ∈ eachindex(a) @inbounds...
Thanks for catching that. I was aware that `nthreads==1` during precompilation was problematic, but during execution it was working as intended. Using `Preferences` seems like a nice workaround. I will...
I did some preliminary benchmarks with different mesh sizes `N=2^(3*p)` using this PR. Overall, it seems that the current PR is a bit slower than master on GPU. The only...
I did some more benchmarks after a local merge of master with this PR. All looks good except removing the workgroup size as we had it it before (`64`). Here...
For example, the TGV case is a 3D case for which I tested domain sizes of 64^3 and 128^3. The arrays we use are then `(64,64,64)` and `(64,64,64,3)` (analogously for...
Sure, I will do some tests after my summer break. But does this mean that we cannot use the default workgrup size (as in this PR)? Could this be something...
I have tested the changes and while the results improve, it is still not there (again, `9b6ca77` is this PR). There might be something else going on but unsure what...