Qingyou Meng

Results 10 issues of Qingyou Meng

In interactive mode: ``` Bob: Sure. The largest city in Europe is Moscow, the capital of Russia. User: xxx ``` Press CTRL+C, the program exits, but terminal color still remains...

In `convert-pth-to-ggml.py`, `dir_model` is something like `models/7B` or `models/7B/`. `tokenizer.model` is expected under model's parent dir. When `dir_model` is a symlink, `f"{dir_model}/../tokenizer.model"` would not be found. Let's use the model's...

bug

Just several minor cleanup. 1. Mac (Intel) related: * `$(UNAME_M)` shows "x86-64". * `shell sysctl -n hw.optional.arm64` outputs an error that should be ignored. * Add additional comment on `-framework...

# Current Behavior `./examples/chatLLaMa`, After about 30-round talks, program quite with `Segmentation fault: 11`. I did another try, input last question, but can't reproduce. # Environment and Context * Physical...

bug
duplicate
model

The original motivation under this PR is try balancing between performance and energy. General speaking, no obvious speedup or slow down from my observations. Main ideas: 1. Spin + pause...

# Introduction MUL_MAT take most of the compute time (about 95%). So to speed up llama, we have to focus on MUL_MAT. BLAS, as one of the fastest MUL_MAT solution...

performance
high priority
threading

Try resolve https://github.com/ggerganov/ggml/issues/284 - comment warning for `GGML_TASK_FINALIZE` in `ggml.h` - removed codes using `GGML_TASK_FINALIZE` in `ggml.c`, including those in `ggml_graph_compute`. - print a warning message in `ggml_compute_forward` to warn...

performance
high priority

Try resolve https://github.com/ggerganov/ggml/issues/287 ### Intro The design is a bit different to the suggested one: named the buffer type as a generalized one:`ggml_cgraph_context`. ``` struct ggml_cgraph_context { size_t work_size; void...

This commit roughly enables `mps` for the d2l's pytorch implementation. Tested on mac 13.2.1 (intel chip), with pytorch 1.12 and 1.13. You may want to have a look at the...