instinct.cpp
instinct.cpp copied to clipboard
`model-api` app for model serving with embedding and reranker
ggml with cuda
llama.cpp server-cuda dockerfile https://github.com/ggerganov/llama.cpp/blob/a27152b602b369e76f85b7cb7b872a321b7218f7/.devops/llama-server-cuda.Dockerfile#L12