ghostplant issues

Results 40 issues of


                                            ghostplant

Does cublasLt support rowwise/blockwise scaled Matmul?

I only see point-wise Matmul examples in the tutorial.

cuBLASLt

为什么会被自动订阅这个 repo？

Does flashinfer support head_size = 576 for Ampere GPUs?

I get an error when dispatching the attention using head_size = 576: ```sh Error: assert(head_size == 128 || head_size == 256); ```

Which one of FlashMLA is faster on Hopper?

**option-1**: flashinfer-ai/flashinfer **option-2**: deepseek-ai/FlashMLA Does anyone get some numbers?

Why the generated libdawn.so is soo large.

I tried to build libdawn for Linux-aarch64. However, the generated size of libdawn.so is around 300-400MB, while the released version [here](https://github.com/jspanchu/webgpu-dawn-binaries/releases/tag/v127.0.6535.0) is just more than 20MB. How to reduce the...

Can DeepEP run correctly in cudaGraph mode?

Two questions: 1. Can DeepEP run normally in cudaGraph mode? 2. Does DeepEP perform "dropless" MoE dispatch? (i.e. no token discarded if tokens are heavily routed to a limited number...

How to run OpenCL apps under Ubuntu/Linux aarch64?

I'd like to run OpenCL with CPU. Given Intel-OpenCL is not compatible, is there any libOpenCL solution for aarch64?

[Issue]: Very slow perf for Gemm BF16

To reproduce: **Command:** `./bin/ckProfiler gemm 2 1 1 2 0 1 32 512 7168 -1 -1 -1 3 100` **GPU Type:** MI300x **Searched Perf:** `Best Perf for datatype = bf16...

Under Investigation

Any Doc to Explain How extract IQ1_S data from GGUFReader?

I want to read from gguf and do downstream inference, while I get these output fields from a IQS_1-quanted tensor: ```sh ReaderTensor(name='blk.22.ffn_down_exps.weight', tensor_type=, shape=memmap([2048, 7168, 256], dtype=uint64), n_elements=3758096384, n_bytes=734003200, data_offset=48...

How to resolve Segment fault?

```sh $ ftllm run Qwen/Qwen3-0.6B Load libnuma.so.1 CPU Instruction Info: [AVX512F: ON] [AVX512_VNNI: ON] [AVX512_BF16: ON] Load libfastllm_tools-cpu.so Segmentation fault (core dumped) ```