Qingyou Meng issues

Results 10 issues of


                                            Qingyou Meng

Fixed color reset problem in interactive mode.

In interactive mode: ``` Bob: Sure. The largest city in Europe is Moscow, the capital of Russia. User: xxx ``` Press CTRL+C, the program exits, but terminal color still remains...

Fixed tokenizer.model not found error when model dir is symlink

In `convert-pth-to-ggml.py`, `dir_model` is something like `models/7B` or `models/7B/`. `tokenizer.model` is expected under model's parent dir. When `dir_model` is a symlink, `f"{dir_model}/../tokenizer.model"` would not be found. Let's use the model's...

bug

Makefile: slightly cleanup for Mac Intel; replace './main -h' with echo.

Just several minor cleanup. 1. Mac (Intel) related: * `$(UNAME_M)` shows "x86-64". * `shell sysctl -n hw.optional.arm64` outputs an error that should be ignored. * Add additional comment on `-framework...

[mqy] ./examples/chatLLaMa: line 53: 33476 Segmentation fault: 11

# Current Behavior `./examples/chatLLaMa`, After about 30-round talks, program quite with `Segmentation fault: 11`. I did another try, input last question, but can't reproduce. # Environment and Context * Physical...

bug

duplicate

model

[Proof of concept] threading: preemptive, local/global

The original motivation under this PR is try balancing between performance and energy. General speaking, no obvious speedup or slow down from my observations. Main ideas: 1. Spin + pause...

Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage

# Introduction MUL_MAT take most of the compute time (about 95%). So to speed up llama, we have to focus on MUL_MAT. BLAS, as one of the fastest MUL_MAT solution...

performance

high priority

threading

deprecate GGML_TASK_FINALIZE and cleanup

Try resolve https://github.com/ggerganov/ggml/issues/284 - comment warning for `GGML_TASK_FINALIZE` in `ggml.h` - removed codes using `GGML_TASK_FINALIZE` in `ggml.c`, including those in `ggml_graph_compute`. - print a warning message in `ggml_compute_forward` to warn...

performance

high priority

Qingyou Meng