Ben Barsdell

Results 9 issues of Ben Barsdell

For example, this fails: `static_assert(std::is_same::value, "");` (`ULLONG_MAX` is actually `unsigned long`). https://github.com/NVIDIA/libcudacxx/blob/4a458fe/include/cuda/std/climits#L46-L47 There may also be an issue on 32-bit systems with this: https://github.com/NVIDIA/libcudacxx/blob/4a458fe/include/cuda/std/climits#L42-L43 These are the only problematic cases...

This allows NVRTC to be driven via a nvrtc_cli executable that operates identically to nvcc (for the subset of functionality supported by NVRTC). This is useful for testing, and we...

The main reason for this request is to improve error handling. When using , CUB currently has to call [cudaPeekAtLastError](https://github.com/NVIDIA/cub/blob/866c576c118ae036fb5c2759ba1e5997967e817c/cub/device/dispatch/dispatch_radix_sort.cuh#L1080) after the launch to check for invalid configuration errors. However,...

type: enhancement
P1: should have

- Dockerfile.base contains all the Bifrost dependencies. - Dockerfile inherits Dockerfile.base and contains the Bifrost build. - The old Dockerfile.cpu and Dockerfile.gpu are removed. - The FROM line and build...

clean-up

- Allocator::DeallocateRaw is called from within a stream callback to ensure stream-aware behavior. However, it is unsafe to call CUDA APIs from inside a stream callback. While BFCAllocator does not...

size:M
comp:core

- The version should have been bumped long ago in commit 725d1ddc3ea20 because it added a new serialization member, which broke compatibility. We only noticed this now because the RAPIDS...

- Replaces C++ lexing/parsing/patching code with a proper lexer implementation, which significantly improves robustness and maintainability. - Replaces minification logic with robust token-based minification. - Replaces preprocessing logic with a...

- Refactors all options handling to use a single parser implementation and new Options and OptionsVec classes. - This makes the code significantly cleaner and more robust. - Maintains backwards...

- Removes use of regex, which was slow. - Makes `jitify2_preprocess` output faster to compile.