Ma Mingfei

Results 93 comments of Ma Mingfei

@slaren just updated cmake compiler options: `-mamx-tile`, `-mamx-int8` and `-mamx-bf16`!

@ggerganov could you please tell me why this CI failed? https://github.com/ggml-org/ci/tree/results/llama.cpp/28/cfc0ffdbfec9520a2c190d57025350229d340c/ggml-4-x86-cuda-v100

@slaren after changing `is_host` to false from the AMX backend leads to an fault from `ggml_backend_sched_backend_id_from_cur` (log attached below). Do you have any insight how to fix it? ``` llama_new_context_with_model:...

@slaren could you please help review this one again? just changed `ggml_backend_buft_is_host` to return false for amx backend.

@ggerganov changed to tab with 4 spaces. also the branch is rebased to squash into one.

@nai-kon originally i wrote kernels for ``` //(type == GGML_TYPE_Q8_0) || //(type == GGML_TYPE_Q4_K) || //(type == GGML_TYPE_Q5_K) || //(type == GGML_TYPE_Q6_K) || //(type == GGML_TYPE_IQ4_XS); ``` but later on,...

@slaren the major problem that i have with ggml-backend is I didn't figure out how to do padding with the AMX backend (when the packing weight for AMX, e.g. vnni...

This PR also adds openmp support since the original pthead sync is done via atomic which has a very high overhead on server CPUs (and the sync has to be...

> BTW why AMX will greatly improve next token latency? I also wrote an vnni kennel for gemv cases.