QuantumClifford.jl icon indicating copy to clipboard operation
QuantumClifford.jl copied to clipboard

Refactor apply! by adding apply_row! to eliminate duplicate code between GPU and CPU

Open Shayan-P opened this issue 11 months ago • 0 comments

In the current implementation, there is a lot of duplicate code in the GPU extension. The reason is that the apply! function uses indexing on rows of the Tableau thus that code cannot be run on GPU, so we have to implement all different gates as a separate GPU kernel.

A high-level abstraction can be having apply_row! function implemented only once for each gate and that can be used in both CPU and GPU code.

Note:

We should make the call to apply_row inline in order to eliminate the overhead of function calls inside the GPU kernel.

My experiment here shows if we use @inline then this will be as fast as our previous version (but without @inline its almost twice slower for GPU).

Then apply! can simply be implemented like this:

function single_kernel(xzs::AbstractMatrix{Tme}, measurements::AbstractMatrix{Bool}, gate, rows::Unsigned) where {Tme <: Unsigned}
    r = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    if r > rows
        return nothing
    end
    apply_row!(xzs, measurements, r, gate)
    return nothing
end


function apply!(frame::PauliFrameGPU{T}, gate) where {T <: Unsigned}
    stab = frame.frame
    rows::Unsigned = size(stab, 1)
    tab = QuantumClifford.tab(stab)
    @run_cuda single_kernel(tab.xzs, frame.measurements, gate, rows) rows
    return frame
end

Shayan-P avatar Aug 27 '23 01:08 Shayan-P