dcompute icon indicating copy to clipboard operation
dcompute copied to clipboard

Dynamic/static shared memory support

Open Rob-Rau opened this issue 5 years ago • 4 comments

I have been evaluating this project recently and was curious what the status of dynamic and static shared memory support is? Playing around with some test code, I can’t get the compiled ptx to emit the proper linkage for either.

I’d like to offer my help in getting these features implemented as I would very much like to use them in a project I’m working on.

Rob-Rau avatar May 10 '20 17:05 Rob-Rau

I'm also interested in accessing shared memory. Here's what I have from a little digging around:

  1. dereferencing SharedPointer!T variables emits the proper op codes in the PTX file, ld.shared and st.shared.
  2. dereferencing Shared!(uint[32]) variables or similar yields ld.local and st.local instructions in the PTX so no-go there.
  3. static shared variables are, reportedly, declared with .shared directives at the start of PTX files.
  4. two PTX special registers report on the amount of shared memory in play (256 byte granularity with recent compute capability)

I've got a couple of hacks to try, and I'll keep digging, but help is always appreciated.

bcarneal avatar Mar 02 '21 21:03 bcarneal

Sorry I haven’t gotten back until now @bcarneal. Have you made any progress? The current main issue with shared memory support right now lies in LDC. I opened an issue covering what I’ve found over in that repo: https://github.com/ldc-developers/ldc/issues/3499

We need a way to make LDC emit the proper shared linkage outlined in the above issue when accessing either static or dynamic shared mem.

I personally believe having

Shared!(uint[32]) var;

should generate the static linkage, and perhaps something like

extern SharedPointer!T var;

To emit the dynamically linked code (this mimics CUDA).

The dcompute memory address structs like Shared and SharedPointer are special cased internally in LDC and last time I was hacking on it I was having trouble getting it view these structs as their underlying pointer instead of a value type. It’s been a while since I’ve last looked into this so my memory is probably a bit rusty.

Rob-Rau avatar May 28 '21 21:05 Rob-Rau

Very little to add to my earlier post. Currently I'm using per-block scratch areas from appropriately aligned global memory for any cooperative work, so not much rush here.

For programmer managed cache access I'd try to bring up a 3? liner injected in to the nvptx file post compilation: extern C void* nvptxDynSharedMemBasePointer() or some such. Hopefully it's as simple as knowing how to return a value from the internal register.

bcarneal avatar May 29 '21 05:05 bcarneal

works if the semantic checker allows string literals.

https://github.com/ldc-developers/ldc/blob/master/gen/semantic-dcompute.cpp#L152

SharedPointer!T sharedStaticReserve(T : T[N], string uniqueName, size_t N)(){
    void* address = __irEx!(`@`~uniqueName~` = addrspace(3) global [`~Itoa!N~` x `~llvmType!T~`] zeroinitializer, align 4 ;
        %Dummy = type { `~llvmType!T~` addrspace(3)* }    
            `, `
        %sharedptr = getelementptr inbounds [`~Itoa!N~` x `~llvmType!T~`], [`~Itoa!N~` x `~llvmType!T~`] addrspace(3)* @`~uniqueName~`, `~llvmType!T~` 0, i64 0
  
        %.structliteral = alloca %Dummy, align 8 
        %dumptr = getelementptr inbounds %Dummy, %Dummy* %.structliteral, i32 0, i32 0
        store `~llvmType!T~` addrspace(3)* %sharedptr, `~llvmType!T~` addrspace(3)** %dumptr
        
        %vptr = bitcast %Dummy* %.structliteral to i8*
        ret i8* %vptr
            `, ``, void*)();
    return *(cast(SharedPointer!(uint)*)address);
}

package:
immutable(string) Digit(size_t n)()
{
    static if(n == 0)
	    return 0.stringof;
    else static if(n == 1)
	    return 1.stringof;
    else static if(n == 2)
	    return 2.stringof;
    else static if(n == 3)
	    return 3.stringof;
    else static if(n == 4)
	    return 4.stringof;
    else static if(n == 5)
	    return 5.stringof;
    else static if(n == 6)
	    return 6.stringof;
    else static if(n == 7)
	    return 7.stringof;
    else static if(n == 8)
	    return 8.stringof;
    else static if(n == 9)
	    return 9.stringof;
    else static assert(0);
}


immutable(string) Itoa(uint n)()
{
	static if(n < 0){
		enum ret = "-" ~ Itoa!(-n);
        return ret;
    }
	else static if (n < 10){
		enum ret = Digit!(n);
        return ret;
    }
	else{
		enum ret = Itoa!(n / 10) ~ Digit!(n % 10);
        return ret;
    }
}

immutable(string) llvmType(T)()
{
    static if(is(T == float))
        return "float";
    else static if(is(T == double))
        return "double";
    else static if(is(T == byte) || is(T == ubyte) || is(T == void))
        return "i8";
    else static if(is(T == short) || is(T == ushort))
        return "i16";
    else static if(is(T == int) || is(T == uint))
        return "i32";
    else static if(is(T == long) || is(T == ulong))
        return "i64";
    else
        static assert(0,
            "Can't determine llvm type for D type " ~ T.stringof);
}

aferust avatar Dec 01 '22 21:12 aferust