llama.cpp Refactor most code in main.cpp into a separate module (preparing to implement TCP mode)

The goal of this refactor is allow reusing the model execution while using streams other than stdin/stdout for interaction.

In my case, I'd like to implement a simple TCP server (which is enabled as a command-line option) that will run llama_main for each new connection, which will be handled in a child process via fork(). This would bring a few benefits:

Loading model weights can be very slow, so in TCP mode we can load it before listening. Each new connection is handled in a forked process, which inherits the parent's memory (so doesn't have to reload the model)
We can quickly start a new context by opening a new TCP socket. New connections will also be able to specify some new parameters such as seed and prompt.
It becomes easier to wrap this into a REST/HTTP server
Can be more convenient in a LAN where you have a powerful computer as the model server.

If this PR is accepted, I will follow up with a PR that implements the TCP server command line option

This PR is simpler to review than it appears. Just look at the commits individually (most of the additions/deletions happen in the first commit, where main.cpp is simply renamed as llama.cpp).

Mar 18 '23 15:03 tarruda

How does this PR tie into the current active refactor here #77 ?

Mar 18 '23 15:03 Green-Sky

How does this PR tie into the current active refactor here https://github.com/ggerganov/llama.cpp/pull/77 ?

I was not aware of that PR, I should have searched it first. The only reason I created this PR is because I had a clear vision of how to implement a TCP server mode into llama.cpp. Honestly not sure what to do, should I close this PR?

Mar 18 '23 15:03 tarruda

Honestly not sure what to do, should I close this PR?

Not my call, but you could review the other PR with your insight :)

Mar 18 '23 16:03 Green-Sky

Not my call, but you could review the other PR with your insight :)

I had a quick look and it seems that the goal in #77 is to make llama.cpp embeddable as a library, which requires modifying/refactoring more than what I do here.

This PR has no such goals and makes almost no changes to existing code. It can be summarized as:

Most code in main.cpp moved to llama.cpp. Didn't split existing functions, only created llama_main which has most code of the old main function.
llama_main now accepts the following as arguments:
- parsed parameters
- preloaded model
- input/output/error streams which are now used instead of hardcoded stdin/stdout/stderr

Mar 18 '23 16:03 tarruda

@tarruda Adding a TCP server would be awesome! Please keep doing this - for now, do it on a branch in this repo as you find best. Just invited you as a collaborator. I will review #77 very soon and merge it first. After that, we will update your changes to fit the C-style API

Mar 18 '23 17:03 ggerganov

Closing in favor of #278

Mar 19 '23 18:03 tarruda

llama.cpp llama.cpp copied to clipboard

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode)

llama.cpp
llama.cpp copied to clipboard