ghostplant

Results 40 issues of ghostplant

I only see point-wise Matmul examples in the tutorial.

cuBLASLt

I get an error when dispatching the attention using head_size = 576: ```sh Error: assert(head_size == 128 || head_size == 256); ```

**option-1**: flashinfer-ai/flashinfer **option-2**: deepseek-ai/FlashMLA Does anyone get some numbers?

I tried to build libdawn for Linux-aarch64. However, the generated size of libdawn.so is around 300-400MB, while the released version [here](https://github.com/jspanchu/webgpu-dawn-binaries/releases/tag/v127.0.6535.0) is just more than 20MB. How to reduce the...

Two questions: 1. Can DeepEP run normally in cudaGraph mode? 2. Does DeepEP perform "dropless" MoE dispatch? (i.e. no token discarded if tokens are heavily routed to a limited number...

I'd like to run OpenCL with CPU. Given Intel-OpenCL is not compatible, is there any libOpenCL solution for aarch64?

To reproduce: **Command:** `./bin/ckProfiler gemm 2 1 1 2 0 1 32 512 7168 -1 -1 -1 3 100` **GPU Type:** MI300x **Searched Perf:** `Best Perf for datatype = bf16...

Under Investigation

I want to read from gguf and do downstream inference, while I get these output fields from a IQS_1-quanted tensor: ```sh ReaderTensor(name='blk.22.ffn_down_exps.weight', tensor_type=, shape=memmap([2048, 7168,  256], dtype=uint64), n_elements=3758096384, n_bytes=734003200, data_offset=48...

```sh $ ftllm run Qwen/Qwen3-0.6B Load libnuma.so.1 CPU Instruction Info: [AVX512F: ON] [AVX512_VNNI: ON] [AVX512_BF16: ON] Load libfastllm_tools-cpu.so Segmentation fault (core dumped) ```