Christopher Harris

Results 20 comments of Christopher Harris

In a distributed scenario where each node processes a byte range, all byte ranges are adjacent and contiguous w.r.t. one another, and the byte ranges cover the entirety of the...

@upsj yeah, this is something I wanted to add to the data chunk reader from the beginning, I just didn't have a use case for it. Anywhere we need to...

The OOM portion of the bug is solved by switch `int` to `size_t` on this line: https://github.com/NVIDIA/thrust/blob/fa54f2c6f1217237953f27ddf67f901b6b34fbdd/thrust/system/cuda/detail/scan_by_key.h#L737 However, that reveals an invalid memory access exception deeper in the kernel. Right...

In reality it looks like tile_base is jumping from near intmax to near size_t max once it reaches a certain threshold. I'll post some example output and a printf I've...

I've added the following printf immediately following the `sync_threadblock()` which appears after the calls to `scan_tile(...)` in `scan_by_key.h` ``` if (threadIdx.x == 0) { printf("AFTER SCAN TILE... tile_idx: %i, tile_base:...

Fixed `tile_base` value by `static_cast(ITEMS_PER_TILE)` here: https://github.com/NVIDIA/thrust/blob/fa54f2c6f1217237953f27ddf67f901b6b34fbdd/thrust/system/cuda/detail/scan_by_key.h#L516 Still need to verify high-level functionality.

With the aforementioned changes: ```diff diff --git a/thrust/system/cuda/detail/scan_by_key.h b/thrust/system/cuda/detail/scan_by_key.h index fe4b321c..b3974c69 100644 --- a/thrust/system/cuda/detail/scan_by_key.h +++ b/thrust/system/cuda/detail/scan_by_key.h @@ -513,7 +513,7 @@ namespace __scan_by_key { scan_op(scan_op_) { int tile_idx = blockIdx.x; -...

It doesn't matter which argument gets cast to `uint64_t`, just so long as they're both promoted before multiplication. ```cu #include enum { ITEMS_PER_TILE = 9 * 256 }; __global__ void...

changing scan_tile_status to 33 bit shouldn't be a major issue, but last I checked (which was a while ago) atomic_ref was WIP. Is that still the case? I'll take a...