CUDA.jl icon indicating copy to clipboard operation
CUDA.jl copied to clipboard

Support for RT Core Ray Trace (may optix)?

Open HaoxuanGuo opened this issue 3 years ago • 6 comments

In newer NVIDIA RT series products, a new RT Core is installed in the GPU. By using this new RT Core, a fast BVH and intersection check could be come out.

So, are there the plan to support RT Core programming? It may be realized by call optix library (I guess)?

HaoxuanGuo avatar May 14 '21 06:05 HaoxuanGuo

It's not on my TODO list. IIRC @cdsousa has had a look at what would be required to work with optix, maybe he has any code?

maleadt avatar May 14 '21 07:05 maleadt

Hi, I'm pasting two conversations I had with @maleadt about this in Slack. I can share some very experimental code (about trying to call Optix API), in case someone cares. I'd like to get back to this in the future, but that's something I'm not sure I'll have time to.

Cristóvão Sousa Sep 25th, 2020 at 09:56 Hi all! As far as I understand Nvidia's Optix framework has an API that is used with CUDA (https://raytracing-docs.nvidia.com/optix7/guide/index.html#introduction#overview). Does anyone have a clue on the efforts that would be required to enabling use of Optix and CUDA.jl together?

27 replies Added to your saved items

Tim Besard 8 months ago create an artifact, generate C wrappers, integrate with CUDA.jl's context management -- maybe a day to get you something that exposes the C API in a reliable way Added to your saved items

Tim Besard 8 months ago but then you probably want some more high level wrappers, and that takes a lot of time, as you know Added to your saved items

Cristóvão Sousa 8 months ago hum, but these would only work for host API, not for the Device API (https://raytracing-docs.nvidia.com/optix7/api/html/group__optix__device__api.html) whose functions (like optixGetLaunchIndex()) I suppose are called from within "kernel" code

Tim Besard 8 months ago yeah, device apis are tricky and generally unsupported

Tim Besard 8 months ago depends on how they are written (edited)

Cristóvão Sousa 8 months ago it would help to if "llvm.nvvm" instructions where available, right? (Sorry, I'm someone who doesn't really even have the big picture of how all CUDA stuff works) Added to your saved items

Tim Besard 8 months ago llvm.nvvm are compiler intrinsics, and those are not going to be available for optix functionality Added to your saved items

Tim Besard 8 months ago if the functions are defined in a C++ header, it's tough Added to your saved items

Tim Besard 8 months ago if they're just plain C-compatible calls with a static library to be linked with the kernels, it should be fine Added to your saved items

Tim Besard 8 months ago that's how we handle cudadevrt Added to your saved items

Tim Besard 8 months ago a similar device library

Cristóvão Sousa 8 months ago ok, thanks for your insights! This is something I may investigate more deeply someday if I see that Optix would be a big advantage for an experimental differentiable path tracer I'm doing. Added to your saved items

Tim Besard 8 months ago

template <typename ReturnT, typename... ArgTypes>
static __forceinline__ __device__ ReturnT optixContinuationCall( unsigned int sbtIndex, ArgTypes... args )
{
    unsigned long long func;
    asm( "call (%0), _optix_call_continuation_callable,(%1);" : "=l"( func ) : "r"( sbtIndex ) : );
    using funcT = ReturnT ( * )( ArgTypes... );
    funcT call  = ( funcT )( func );
    return call( args... );
}

Added to your saved items

Tim Besard 8 months ago so the calls are fine, they just embed an 'unresolved' call in the bitcode, which is then resolved to (presumably) a device library after linking (edited) Added to your saved items

Tim Besard 8 months ago hm, can't find such a library though. I wonder where it comes from Added to your saved items

Tim Besard 8 months ago huh, so the host part, libnvoptix, is already shipped as part of the driver Added to your saved items

Tim Besard 8 months ago and presumable those calls are automatically resolved during ptx compilation too Added to your saved items

Tim Besard 8 months ago so you don't even need any artifact Added to your saved items

Tim Besard 8 months ago calling the host functions will be a bit tricky, since libnvoptix only exports a single function: optixQueryFunctionTable, and the Optix headers document the return structure (with pointers to functions for the API)

Tim Besard 8 months ago funny how all of their libraries have different ways of interfacing...

Cristóvão Sousa 8 months ago so the the device api calls would be less trickier than the host ones? That I would not expect (edited)

Tim Besard 8 months ago possibly, yes

Tim Besard 8 months ago they'd have to be hand-written in both cases though

Tim Besard 8 months ago so no autogeneration with CLang.jl

Cristóvão Sousa 8 months ago :+1:

Karthik Raj Katipally 8 months ago I tried this when optix 7 was out. I had issues with vararg types I remember. I will try to collect my efforts and push whatever i have. Its not much but hoping it saves time for others. :+1: 2

Cristóvão Sousa 8 months ago Oh interesting, I'd appreciate it.


Cristóvão Sousa Dec 14th, 2020 at 19:20 @maleadt, I've been very slowly trying to understand how Nvidia Optix could work with Julia/CUDA.jl. So far I've understood that Optix has a C API to where one sends "programs" (ray generation, intersection) that are no more than PTX code (literally a string) compiled from functions with specific signatures. The functions have to have specific name prefixes, like raygen and receive no parameters. The input of such functions has to be done through global variables, e.g., .const .align 8 .b8 params[24];, and through calls to a device API defined in asm in C headers, e.g., call (%r1), _optix_get_launch_index_x, ();. @maleadt can you give some hint whether any of this 3 means may pose a big problem to current CUDA.jl workings? 5 replies

Cristóvão Sousa 5 months ago I've seen that PTX from CUDA.jl has some prepended prefixes in function names. I've found nothing about global variables. And I still don't know how to manually translate an asm call from an API heather to Julia. But I've been able to call host API from Julia through hand-made wrappers.

Tim Besard 5 months ago I think all that should be doable

Tim Besard 5 months ago there's a constant memory PR that should make the const globals easily usable too Added to your saved items

Tim Besard 5 months ago and for the header calls:

julia
julia> kernel() = ccall("extern _optix_get_launch_index_x", llvmcall, Cint, (), )
kernel (generic function with 1 method)
julia> CUDA.code_ptx(kernel, Tuple{})
//
// Generated by LLVM NVPTX Back-End
//
.version 6.3
.target sm_75
.address_size 64
	// .globl	julia_kernel_1816       // -- Begin function julia_kernel_1816
.extern .func  (.param .b32 func_retval0) _optix_get_launch_index_x
()
;
.weak .global .align 8 .u64 exception_flag;
                                        // @julia_kernel_1816
.visible .func  (.param .b32 func_retval0) julia_kernel_1816()
{
	.reg .b32 	%r<3>;
// %bb.0:                               // %top
	{ // callseq 1, 0
	.reg .b32 temp_param_reg;
	.param .b32 retval0;
	call.uni (retval0), 
	_optix_get_launch_index_x, 
	(
	);
	ld.param.b32 	%r1, [retval0+0];
	} // callseq 1
	st.param.b32 	[func_retval0+0], %r1;
	ret;
                                        // -- End function
}

Cristóvão Sousa 5 months ago Nice, thanks!

cdsousa avatar May 14 '21 11:05 cdsousa

@cdsousa - were you able to share your code anywhere?

alhirzel avatar Dec 07 '21 21:12 alhirzel

Here they are. Direct ccall:

import Libdl

libnvoptix = Libdl.dlopen("libnvoptix.so.1", Libdl.RTLD_NOW)
optixQueryFunctionTable = Libdl.dlsym(libnvoptix, "optixQueryFunctionTable")

OPTIX_ABI_VERSION = 41

optixFunctionTable = fill(Ptr{Cvoid}(0), 38)

ret = @ccall "libnvoptix.so.1".optixQueryFunctionTable(OPTIX_ABI_VERSION::Cint, 0::Cint, 0::Cint, 0::Cint, optixFunctionTable::Ptr{Cvoid}, sizeof(optixFunctionTable)::Cint)::Cint

ret = @ccall $optixQueryFunctionTable(OPTIX_ABI_VERSION::Cint, 0::Cint, 0::Cint, 0::Cint, optixFunctionTable::Ptr{Cvoid}, sizeof(optixFunctionTable)::Cint)::Cint

optixFunctionTable

Through a "JIT compiled" wrapper:

# https://github.com/ingowald/optix7course

import Libdl
import CUDA
import CUDA_jll

cnt = Ref{Cint}(0)
CUDA.cuDeviceGetCount(cnt)
@assert cnt[] >= 1

# Libdl.dlopen("libcudart")

optixwrapper_code = """
    #include <cuda_runtime.h>
    #include <optix.h>
    #include <optix_stubs.h>
    #include <optix_function_table_definition.h>
    OptixResult _optixInit(){ return optixInit(); }
    const char* _optixGetErrorString(OptixResult result){ return optixGetErrorString(result); }
    OptixResult _optixDeviceContextCreate(CUcontext fromContext, const OptixDeviceContextOptions* options, OptixDeviceContext * context){return optixDeviceContextCreate(fromContext, options, context);}

    int test(){
        cudaFree(0);
        _optixInit();
        return 0;
    }
    """
const optixwrapper_so = tempname()
const optix = optixwrapper_so
optixwrapper_incs = [
    "-I", "NVIDIA-OptiX-SDK-7.0.0-linux64/include/",
    "-I", joinpath(CUDA_jll.artifact_dir, "include")
    ]
optixwrapper_libs = [
    "-L", CUDA_jll.LIBPATH_list[1],
    "-l", "cudart"
    ]
open(`clang -fPIC -O3 -msse3 -xc -shared -o $(optixwrapper_so * "." * Libdl.dlext) $optixwrapper_incs $optixwrapper_libs -`, "w") do f
    print(f, optixwrapper_code)
end
isdefined(Main, :optixwrapper_handle) && Libdl.dlclose(optixwrapper_handle)
optixwrapper_handle = Libdl.dlopen(optixwrapper_so)

# # -------------------------------------

ret = @ccall optix.test()::Cint
unsafe_string(@ccall optix._optixGetErrorString(ret::Cint)::Cstring)

cdsousa avatar Dec 07 '21 23:12 cdsousa

Hi all, I was able to make a proof of concept of creating an OptiX program from a CUDA.jl kernel and launching it from Julia: https://gist.github.com/cdsousa/0e3b1d523b92bd3d3e117044aac75cd1 It is a simple sample that doesn't yet use acceleration structures.

cdsousa avatar Aug 13 '22 10:08 cdsousa

I "ported" the OptiX triangle sample, this time making use of acceleration structures and trace calls, and relying on Clang.jl for the library wrapper: https://gist.github.com/cdsousa/bf49b400300d07e5ecf0fdaea7a58b02

cdsousa avatar Sep 14 '22 07:09 cdsousa

Hi @maleadt and others,

I further develop and then packed all the code I had in gists (comments above) into a package that can be easily tested.

However, the intention is just to prove the concept and demonstrate the solutions to the problems I found along the way. It is not intended to be a real Julia package.

https://github.com/cdsousa/OptiX.jl

hello_triangle_output

cdsousa avatar Nov 06 '22 11:11 cdsousa

Nice work! There's some truly horrible things in there 😄

It is not intended to be a real Julia package.

So you want to inline this functionality in CUDA.jl? It's probably not too useful for most users, so out of load time considerations its probably best to keep this as a separate module at the least.

maleadt avatar Nov 07 '22 09:11 maleadt

Ah, no, it was not my intent to push this into CUDA.jl 😄 I just mentioned it here due to the original request.

I say it is not a real package in the sense that it is just a draft of one. I think it displays some good solutions but others could be greatly improved and many more abstractions should be designed.

I won't have the time resources to take the PoC to the next level. But if anyone ever has, it could provide an idea of how to overcome many of the OptiX wrapping difficulties. Otherwise, that code can stay as it is, and I'm ok with that (it was an entertaining side project).

cdsousa avatar Nov 07 '22 09:11 cdsousa