cloudhan
cloudhan
**Describe the bug** **Steps/Code to reproduce bug** ```cuda #include "cute/tensor.hpp" using namespace cute; __global__ void kernel() { constexpr auto weird = right_inverse(make_layout(_2{}, _1{})); print(weird); } int main() { kernel(); cudaDeviceSynchronize();...
**Describe the bug** As of b7508e337938137a699e486d8997646980acfc58, `Copy_Atom` cause misaligned address. **Steps/Code to reproduce bug** ```cuda #include using namespace cute; __global__ void kernel(int m, int k, float* a, int lda) {...
**Describe the bug** `make_tiled_copy` also should not secretly pad `Thr` and `Val`. See code sample and discussion. **Steps/Code to reproduce bug** ```cpp #include using namespace cute; int main() { std::vector...
`/opt/rocm/.info/version-dev` is only available if the `rocm-dev` metapackage is installed. This will bring a lot of unused packages which are not needed by the users, they may opt for fine...
Potentially fix #238
https://github.com/Jimver/cuda-toolkit/issues/315 Just wait for upsteam fix will be OK.
Hi, This is not an issue.. I'd like to inform the incomer that I am developing an [rules_cuda](https://github.com/cloudhan/rules_cuda.git) It has the following feature: 1. Pure Starlark implementation 2. Supports both...
Some chats from slack me: > @Gisle Dankel Do you have the context of why static linking is required on windows for libkineto? Gisle: > It’s not - in fact...
With ~10 steps of resnet50 being profiled, there will be a roughly 2.3s freeze on 10900x machine. The reason of the freezing is because of **Recalculate Style** and **Layout**, which,...