Dingjifeng

Results 4 issues of Dingjifeng

Here I write a simple cuteDSL program in order to perform cast from fp32 tensor to bf16 tensor: ``` import argparse import math import torch import triton from typing import...

question
? - Needs Triage

I want to implement a GemmSM90 with A bf16 & B fp32. i want to cast B from fp32 to bf16 in smem as steps: 1. load fp32 smem to...

question
? - Needs Triage

``` tXsH0 = thr_copy_h0_s2r.partition_S(sH0[:,:,0]) ^^^^^^^^^^ File "/mnt/shared-storage-user/anaconda3/envs/fla/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cute/tensor.py", line 559, in _check_can_load_store raise NotImplementedError( NotImplementedError: load & store swizzled memory is not supported yet: tensor ``` It seems unable to print...

question
? - Needs Triage

When using clangd: Could not build CompilerInvocation for file cute/csrc/gemm_sm80/gemm_sm80.cu Folder: ``` $ tree -L 3 . ├── csrc │ ├── build │ │ ├── CMakeCache.txt │ │ ├── CMakeFiles...