cloudhan

Results 191 comments of cloudhan
trafficstars

I cannot reproduce it with CUDA 11.5. If the same error don't occur with new version of CUDA, then I think you better to report to jax team directly. Note,...

Based on you commandline, you correctly installed jaxlib and jax. This is indeed very strange. Are you using python distributed from Microsoft Store? If so, pls stop using it and...

OK, I was thinking that you are not the only victim of Microsoft Python https://github.com/cloudhan/jax-windows-builder/issues/16, but it turn out that you are stepping into the same river twice.

I also face a similar issue. The repo is clone with `git clone -b --filter=blob:none`. `git checkout main && git move -s -d main` crashes and when `git show `...

with ```cuda if(thread(255)) { print(stripe_gA(_, _, _0{}, 0)); print(thr_copy.partition_S(stripe_gA(_, _, _0{}, 0))); print_tensor(staging_a); } ``` ```cuda gmem_ptr[32b](0xb04c00000) o (_128,_8):(_1,1024) //

In summary, the secret behaviour limit the usage of `make_tiled_copy`, in some edge cases, it produce incomprehensible and unexpected "incorrect" result.

I happen to create my own utility function: ```cpp #include "cute/tensor.hpp" using namespace cute; template auto make_tv_view(Tensor&& tensor, const ThrLayout& thr_layout = {}, const ValLayout& val_layout = {}) { auto...

Great to know that I can use underscore in `compose`.

EDIT: removed unnecessary compose @ccecka The `.compose(layout_tv, _)` does not work for 3d case. More work for you now =). ```cpp #include using namespace cute; template __host__ __device__ constexpr auto...

I found this because I applied the simplification, and it works pretty well for 1d and 2d case. For 3d, my sliced tensor is pretty weird and I scratched my...