`model-api` app for model serving with embedding and reranker

Open RobinQu opened this issue 1 year ago • 1 comments

Jul 01 '24 08:07 RobinQu

ggml with cuda

llama.cpp server-cuda dockerfile https://github.com/ggerganov/llama.cpp/blob/a27152b602b369e76f85b7cb7b872a321b7218f7/.devops/llama-server-cuda.Dockerfile#L12

Jul 03 '24 01:07 RobinQu