LambertW.jl Cannot use `lambertw` on the GPU

I've been trying to use this function on the GPU but I always get an error (the original issue I posted is https://github.com/CliMA/Oceananigans.jl/issues/3438). Mostly I work indirectly with KernelAbstractions.jl, and the following MWE illustrates the error I'm getting:

using KernelAbstractions
using LambertW: lambertw

@kernel function mul2_kernel(A)
  I = @index(Global)
  A[I] = lambertw(A[I])
end

using CUDA: CuArray
A = CuArray(ones(10, 10))

backend = get_backend(A)
mul2_kernel(backend, 64)(A, ndrange=size(A))
synchronize(backend)
display(A)

Instead of this working, I get a huge error that starts with

ERROR: LoadError: InvalidIRError: compiling MethodInstance for gpu_mul2_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(64, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, ::CUDA.CuDeviceMatrix{Float64, 1}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to __memcmp_evex_movbe)
Stacktrace:
 [1] _memcmp
   @ ./strings/string.jl:124
 [2] ==
   @ ./strings/string.jl:136
 [3] multiple call sites
   @ unknown:0
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Stacktrace:
 [1] multiple call sites
   @ unknown:0

You can see whole error here.

I also was able to come up with a simpler example using CUDA.jl that produces a similar error, but for some reason occasionally this fails with a segfault, so I'm not sure if the same thing is going on here (or maybe I'm doing something wrong since I've never really worked with CUDA directly). Here's the CUDA MWE:

using LambertW: lambertw
using CUDA: CuArray

A = CuArray(ones(10, 10))
B = lambertw.(A)
display(B)

Is there any way to make lambertw work on the GPU? More specifically work with KernelAbstractions?

Thanks!

Jan 24 '24 17:01 tomchor

See

https://github.com/CliMA/Oceananigans.jl/issues/3438#issuecomment-1910836945

I am pretty busy with my day job, etc. But the best solution i can think of of the top of my head is to start by returning the result of the root finding and the number of iterations. And put a nice interface on it. And avoid the warning altogether.

If you can run a forked or local version (if possible), try commenting out the @warn line and see if the error disappears. Just to confirm that is really the problem.

Jan 25 '24 19:01 jlapeyre

Thanks for checking into this. I haven't tested it myself, but it seems that indeed that @warn line is the cause of the error here, even though I never reach that error.

I can of course fork the repo, comment that line out, and use it for my purposes, but I'd like to ask if you'd like the code here to be changed instead? In general I think it's advantageous to have packages be GPU-compatible, but I'm curious about your thoughts.

If you agree I could submit a PR changing the function to return the value, plus 0 when converged and 1 when not, or something along the lines of what you suggested.

Jan 26 '24 00:01 tomchor

Also I see there's been some discussion about possibly moving this package to SpecialFunctions.jl in https://github.com/JuliaMath/SpecialFunctions.jl/pull/371. In my opinion moving lambertw there would result in more eyes on it for issues like this. Since you seem to be the main developer here according the contributions page, it would put less pressure on you to interact with issues like this one when you're busy, which seems advantageous. Just a thought :)

Jan 26 '24 01:01 tomchor

To me, it's very clear that the current design is wrong. There should be at least an inner function, with a good interface, that does not do IO.
A PR would be good. I'd like to try to get a good design because this package has a significant number of users. In particular, best would be if current users see no visible change. That is, they can use the same API they are already using. (There is no reason to drop the warning for these users, but even if we did, I doubt it would make any difference as I think it is difficult to impossible for the root finding to fail.
You saw an example of the discussion of whether to aggregate or dis-aggregate software. It goes on ad infinitum but never is data mentioned. We decided to take the first step of moving this repo from my personal account to the JuliaMath org. I thought the wiser prediction was that it would see no change in the attention this package gets. It seems clear that prediction was borne out. No one but me has done anything to maintain this package. I'm not really against moving it to SpecialFunctions. But I would be very surprised if it received any more attention there just due to it's new location. On the other hand, I'd be more reluctant to merge my own PRs there. I'd perfer not to do it here either. But, although several other people have merge privileges for this repo, I don't expect any to do it.

Perhaps a good starting point is a subtype NoWarn of LambertWMode. This is a minimal change in the API to satisfy the needs of GPU users. So: lambertw(z,k,maxits) as before for cpu users. And maybe lambertw(z,k,maxits; mode=NoWarn()) for GPU users. These would be wrappers around a function that does the computation and returns in some way a pair of items: the result and a flag indicating success or failure.

Jan 26 '24 17:01 jlapeyre

LambertW.jl LambertW.jl copied to clipboard

Cannot use `lambertw` on the GPU

LambertW.jl
LambertW.jl copied to clipboard