KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

InvalidIRError: Reason: unsupported dynamic function invocation (call to print_to_string(xs...)

Open drrmmng opened this issue 3 years ago • 1 comments

Hi,

when I run the code below on the CPU everything works as expected. When trying to run it on the GPU, I get the error:

ERROR: LoadError: InvalidIRError: compiling kernel gpu_get_segment_indices_kernel!(KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(3, 10)}, KernelAbstractions.NDIteration.DynamicCheck, Nothing, Nothing, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.StaticSize{(1, 10)}, KernelAbstractions.NDIteration.StaticSize{(16, 1)}, Nothing, Nothing}}, CuDeviceMatrix{CartesianIndex{2}, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) in Base at strings/io.jl:133)

Any ideas? Maybe related to https://github.com/JuliaGPU/KernelAbstractions.jl/issues/286

The code:

using CUDA
using CUDAKernels
using KernelAbstractions

@kernel function get_segment_indices_kernel!(result, @Const(x), @Const(knots))
    I = @index(Global, NTuple)
    variable_index = I[1:end - 1]

    result[I...] = CartesianIndex(
        clamp(
            searchsortedlast(knots[:, variable_index...], x[I...]),
            firstindex(knots, 1),
            lastindex(knots, 1) - 1,
        ),
        variable_index...,
    )

    nothing
end

function get_segment_indices(x, knots)
    kernel_device = begin
        if x isa CuArray
            CUDADevice()
        else
            CPU()
        end
    end

    indices = similar(x, eltype(CartesianIndices(x)))

    kernel = get_segment_indices_kernel!(kernel_device, 16, size(x))
    event = kernel(indices, x, knots)
    wait(event)

    indices
end


# todevice = identity
todevice = cu


n_obs = 10
n_variables = 3
n_bins = 4

x = rand(n_variables..., n_obs) |> todevice
knots = sort(rand(n_bins, n_variables...), dims=1) |> todevice

indices = get_segment_indices(x, knots)

Versions: julia version 1.7.1 [052768ef] CUDA v3.8.3 [72cfdca4] CUDAKernels v0.4.0 [63c18a36] KernelAbstractions v0.8.0

drrmmng avatar Mar 09 '22 12:03 drrmmng

Is there anything I could do to improve this issue?

drrmmng avatar Mar 16 '22 14:03 drrmmng