llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Support -ins for alpaca model in tcp server

Open vonjackustc opened this issue 1 year ago • 5 comments

Change multiple printf to fprintf(outstream

vonjackustc avatar Mar 21 '23 09:03 vonjackustc

@vonjackustc can you change the target branch to tcp_server?

tarruda avatar Mar 21 '23 09:03 tarruda

I did same thing but in windows, here socket stream is a nice idea, but I use thread ThreadSafeQueueto impl it.

KyL0N avatar Mar 21 '23 10:03 KyL0N

@vonjackustc I missed these new extra printf statements in one of the recent rebases, just integrated your changes to the tcp_server branch, thanks for catching it.

tarruda avatar Mar 21 '23 11:03 tarruda

@vonjackustc I missed these new extra printf statements in one of the recent rebases, just integrated your changes to the tcp_server branch, thanks for catching it.

You can change LLAMA_N_PARTS from { 5120, 2 } to { 5120, 1 } to support quantized alpaca-13b-q4.bin here: https://github.com/antimatter15/alpaca.cpp#getting-started-13b But it would lose compatibility with original llama. Maybe you can make it configurable :D

vonjackustc avatar Mar 22 '23 01:03 vonjackustc

You can change LLAMA_N_PARTS from { 5120, 2 } to { 5120, 1 } to support quantized alpaca-13b-q4.bin here: https://github.com/antimatter15/alpaca.cpp#getting-started-13b But it would lose compatibility with original llama. Maybe you can make it configurable :D

I have no idea what these parameters mean, but isn't this what the --n_parts parameter does?

tarruda avatar Mar 22 '23 14:03 tarruda