candle icon indicating copy to clipboard operation
candle copied to clipboard

Short context length on Qwen quantized examples.

Open AlpineVibrations opened this issue 4 months ago • 1 comments

Running any of the quantized examples so far they all seam to have a 1024 token limit.

cargo run --example quantized-qwen3 --release --features cuda,cudnn -- --which 4b   --prompt "1802tokens later..."
Error: shape mismatch on target dim, dst: 1024, src: 1802 + 0

AlpineVibrations avatar Jul 02 '25 19:07 AlpineVibrations