Judd comments

Results 57 comments of


                                            Judd

trafficstars

docs(README): update README.md

Yes, some pre-processing rules are ignored and not implemented, which may cause subtle differences. I may add these functions later. At present, I am busy with adding more models.

Update jni.c for enabling word timestamp in Android

This does not solve the problem. Maybe you can add another API, with 3 additional params: `token_timestamps`, `split_on_word` and `max_len`.

How to use GPU?

This is dedicated to those who are GPU-poor, but stay tuned. 😄

> @foldl I tried to build it with GPU support by using `cmake -B build-gpu -DGGML_CUDA=ON -DGGML_CUDA_F16=ON -DBUILD_SHARED_LIBS=ON` (it works fine for llama.cpp), but compilation fails with errors like that:...

How to use GPU?

@MoonRide303 Now, this can be built against CUDA. But only a few models work I guess.

How to use GPU?

How about `-ngl 10`? I have tested Vulkan & CUDA. It is OK with this model.

How to use GPU?

@MoonRide303 Sorry for the incorrect information. I have tested QWen2.5 7B & Llama3.1 8B with CUDA. Note: models with `lm_head` tied to embedding do not work, generally. ``` build-cuda\bin\Release\main.exe -m...

How to use GPU?

`-l` (i.e. `--max_length`) shall be used to reduce the VRAM usage. `-c` is used by context extending method. The naming is a little bit confusing.

Judd

docs(README): update README.md

Update jni.c for enabling word timestamp in Android

baichuan13 does not work well

baichuan13 does not work well

How to use GPU?

How to use GPU?

How to use GPU?

How to use GPU?

How to use GPU?

How to use GPU?