cloud11665

Results 16 issues of cloud11665

Also doing formatting as it's easier to compare ptx to cuda backend code. orange - instruction or immediate value purple - state space blue - operand or identifier green -...

I will post llama timing benchmarks soon.

Looks like https://bangplayer.live is down

repro steps (on a 2x4090 machine) `CUDA_VISIBLE_DEVICES=1 NV=1 DEBUG=1 python3 -m examples.hlb_cifar10` -> gpu0 gets loaded `CUDA_VISIBLE_DEVICES=1 CUDA=1 DEBUG=1 python3 -m examples.hlb_cifar10` -> gpu1 gets loaded

### What happened? The limit is respected when requesting a chat completion, but for non-chat ones, the model keeps generating tokens forever (until ctx-len is reached). With non-streaming there is...

bug-unconfirmed
high severity

output of `corefreq-cli -k -n -B -n -M` ``` Linux: |- Release [6.8.0-57-generic] |- Version [#59-Ubuntu SMP PREEMPT_DYNAMIC Sat Mar 15 17:40:59 UTC 2025] |- Machine [x86_64] Memory: |- Total...

bugfix