Gao, Xiang issues

Results 25 issues of


Gao, Xiang

add support for Microsoft Office and LibreOffice

torch.view跟torch.reshape的区别能不能在notebook里面讲一下？

这个文件： https://github.com/zergtant/pytorch-handbook/blob/master/chapter1/1_tensor_tutorial.ipynb

Device-side launch of thrust::lower_bound is creating wrong results

```C++ #include #include #include __global__ void lowerbound(float inp_val) { constexpr int size = 6; float a[size] = {0.1, 0.2, 0.4, 0.6, 0.8, 1.}; auto result = thrust::lower_bound( thrust::device, a, a...

type: bug: functional

P1: should have

helps: pytorch

backend: CUDA

BlockRadixSort needs overloads that take the problem size and correctly sets the padding value for unused inputs

I am trying `cub::BlockRadixSort` with PyTorch, it is getting good performance, but I find it is hard to use: For example, if I want to sort 1023 elements, then I...

type: enhancement

P2: nice to have

helps: pytorch

Optimize cub::DeviceSegmented[Radix]Sort for small number of segments

Currently, `cub::DeviceSegmentedRadixSort` launches `num_segments` blocks and each block works on one segment. This approach does not have good performance when the number of segments is small: https://github.com/pytorch/pytorch/issues/63456. For small number...

type: enhancement

area: performance

P3: backlog

Allow iterators in cub::DeviceRadixSort

Currently, `cub::DeviceRadixSort` only support operating on pointers ```C++ template static CUB_RUNTIME_FUNCTION cudaError_t SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int...

type: enhancement

P2: nice to have

helps: pytorch

[WIP] Allow cub::DeviceRadixSort and cub::DeviceSegmentedRadixSort to use iterator as input

Fixes https://github.com/NVIDIA/cccl/issues/868

P3: backlog

helps: pytorch

Gao, Xiang

add support for Microsoft Office and LibreOffice

torch.view跟torch.reshape的区别能不能在notebook里面讲一下？

Device-side launch of thrust::lower_bound is creating wrong results

BlockRadixSort needs overloads that take the problem size and correctly sets the padding value for unused inputs

Optimize cub::DeviceSegmented[Radix]Sort for small number of segments

Allow iterators in cub::DeviceRadixSort

[WIP] Allow cub::DeviceRadixSort and cub::DeviceSegmentedRadixSort to use iterator as input

Support named tensor

Increase the chemical space coverage of unit tests

Improve test coverage