Break down main function in llama-server

Open ericcurtin opened this issue 8 months ago • 2 comments

llama-server main function is getting meaty, just breaking it down into smaller functions.

May 10 '25 12:05 ericcurtin

Incomplete

May 10 '25 12:05 ericcurtin

Before going further, I think it's better to discuss a plan rather than diving into the code.

While working on https://github.com/ggml-org/llama.cpp/pull/13400#issuecomment-2866290941 , I also thought about refactoring server.cpp into small components, this should be done in a way that is easy to enable routing requests to multiple models on the same server instance.

For now, the most simple task is of course to abstract out the creation of HTTP server. Second task could be to move all the HTTP handler to a completely separated file. The main component, server_context may also need to be moved to a dedicated file.

May 10 '25 12:05 ngxson