Judd

Results 57 comments of Judd
trafficstars

Yes, some pre-processing rules are ignored and not implemented, which may cause subtle differences. I may add these functions later. At present, I am busy with adding more models.

This does not solve the problem. Maybe you can add another API, with 3 additional params: `token_timestamps`, `split_on_word` and `max_len`.

GPU acceleration is not supported yet.

@cagev It is OK now.

This is dedicated to those who are GPU-poor, but stay tuned. 😄

> @foldl I tried to build it with GPU support by using `cmake -B build-gpu -DGGML_CUDA=ON -DGGML_CUDA_F16=ON -DBUILD_SHARED_LIBS=ON` (it works fine for llama.cpp), but compilation fails with errors like that:...

@MoonRide303 Now, this can be built against CUDA. But only a few models work I guess.

How about `-ngl 10`? I have tested Vulkan & CUDA. It is OK with this model.

@MoonRide303 Sorry for the incorrect information. I have tested QWen2.5 7B & Llama3.1 8B with CUDA. Note: models with `lm_head` tied to embedding do not work, generally. ``` build-cuda\bin\Release\main.exe -m...

`-l` (i.e. `--max_length`) shall be used to reduce the VRAM usage. `-c` is used by context extending method. The naming is a little bit confusing.