Gao, Xiang
Gao, Xiang
这个文件: https://github.com/zergtant/pytorch-handbook/blob/master/chapter1/1_tensor_tutorial.ipynb
```C++ #include #include #include __global__ void lowerbound(float inp_val) { constexpr int size = 6; float a[size] = {0.1, 0.2, 0.4, 0.6, 0.8, 1.}; auto result = thrust::lower_bound( thrust::device, a, a...
I am trying `cub::BlockRadixSort` with PyTorch, it is getting good performance, but I find it is hard to use: For example, if I want to sort 1023 elements, then I...
Currently, `cub::DeviceSegmentedRadixSort` launches `num_segments` blocks and each block works on one segment. This approach does not have good performance when the number of segments is small: https://github.com/pytorch/pytorch/issues/63456. For small number...
Currently, `cub::DeviceRadixSort` only support operating on pointers ```C++ template static CUB_RUNTIME_FUNCTION cudaError_t SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int...
Fixes https://github.com/NVIDIA/cccl/issues/868
Because writing something like ```python atomic_energies.sum(dim='atoms') ``` is much more readable than ```python atomic_energies.sum(1) ```
- [x] Add 3D structures from NIST https://github.com/aiqm/torchani/pull/146 - [ ] Add more off-equilibrium structures, reactions - [x] Test structure optimization https://github.com/aiqm/torchani/pull/153