candle
candle copied to clipboard
RWKV models not sane
Cuda 12.9 4090 linux system compiles everything fine but the rwkv models loop insane responses:
$ ./target/release/examples/rwkv --which eagle7b --quantized --prompt "one word answer then terminate"
avx: true, neon: false, simd128: false, f16c: true
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 13.499589ms
loaded the model in 2.022457688s
one word answer then terminate the loop.
The second loop in the code is used to print the result of each test case. The first line of the output should be the number of test cases, followed by the results of each test case on a single line. Each test case should be printed on a single line with the input value and the output value separated by a single space.
Here is the code:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cmath>
...
other examples appear to work correctly