Qingyou Meng
Qingyou Meng
> ```shell > python3 quantize.py --models-path models 7B > ``` Run as expected, great! Actually, the default model path was created with `os.path.join(os.getcwd(), "models")`, that's an absolute path.
Let me try explain this. 1. 50% total usage on your 32 cores is about 16 cores 100% usage. 2. From my observations (4 ~ 6 threads), typical average per...
> @mqy You wrote that half the work is done on the main thread, can you elaborate on that ? The graph compute spins up n threads and those do...
Sorry for the delay, I'm back. > ... then explicitly disable the ones that do not need `INIT` or `FINALIZE`. MAY be better to keep the default behavior for some...
> I think that `n_tasks` should be removed from `ggml_tensor`. Of course, `n_tasks` should belong to the compute facility I think, it's ideal to migrate to some place else. So,...
> is there any other data from the tensor would belong there? No way. > adding a simple `int n_tasks[GGML_MAX_NODES]` Great! The n_tasks array is good enough. > and eventually...
> Eventually, we will want a common interface to all compute backends so that users can use any of them interchangeably, without having to add specific code for each one....
> I am not favoring the implementation Yes, totally make points. A general context is used for future extensibility, but should be seen as over design. If you have read...
> There is no need to worry about backwards compatibility This is great, I'll rewrite this PR, show you later.
The PR description was updated. You may want to have a look at it. Apart from `main` and `perplexity`, also tested: - tests/test-grad0 - tests/test-opt - examples/benchmark - examples/baby-llama -...