Ben Barsdell comments

Results 37 comments of


                                            Ben Barsdell

Cannot use `<limits>` and `<cuda/std/limits>` in the same source file

I'll see if I can take another look at this later this week.

Cannot use `<limits>` and `<cuda/std/limits>` in the same source file

I believe the root cause of this is the `#include ` header being loaded from jitify's builtins and cached, and then, when `#include "climits"` is encountered within libcu++, jitify uses...

Question: Are cuModules shared between kernels from same program

No, currently they are not shared, each kernel instantiation has its own cuModule, so the addresses will be different (I confirmed with a test). This is arguably a design flaw...

Question: Are cuModules shared between kernels from same program

I think linking will have the same issue because there will still be multiple modules, unless I'm misunderstanding. > Would it not be possible to simply change the internals so...

Can't build with NVCC option '--Werror cross-execution-space-call' on Windows

Thanks for the PR! I filed an internal bug about the `__host__` `__device__` warnings; it seems to be a compiler issue. I believe it only affects debug builds, but I...

load_program() performance (with large include hierarchies)

Thanks for this feedback, it's an important issue. The situation is a bit tricky. At its heart is the fact that having NVRTC load headers implicitly from the filesystem (which...

load_program() performance (with large include hierarchies)

> We currently generate 1 dynamic header, which is not on the file-system, so this would create a problem for us. This will still be possible by providing the header...

Is it possible to create program from cuBin or PTX?

In the jitify2 API (under development) you can do this: https://github.com/NVIDIA/jitify/blob/ca7f794/jitify2.hpp#L2153

Using __half with NVRTC and jitify

To ensure cuda_fp16.h can be found you'll need to pass the CUDA Toolkit include directory as a flag like this: `-I/path/to/cuda/include`. Here's a minimal example (it uses `half` which is...

Using __half with NVRTC and jitify

That's right. One option would be to use an environment variable like CUDA_PATH.