maid icon indicating copy to clipboard operation
maid copied to clipboard

Phi-3 models support in local generation (llama.cpp)

Open Foul-Tarnished opened this issue 10 months ago • 1 comments

Phi-3 mini should be great on mobile, and apparently it's not that stupid for its size

but model doesn't seem to load no real ram usage (12gb, 7gb available)

{name: Phi-3-mini-4k-instruct-q4.gguf, uri: /data/user/0/com.danemadsen.maid/cache/file_picker/1714279881171/Phi-3-mini-4k-instruct-q4.gguf, token: , randomSeed: true, useDefault: false, penalizeNewline: true, seed: 0, nKeep: 48, nPredict: 256, topK: 40, topP: 0.95, minP: 0.1, tfsZ: 1.0, typicalP: 1.0, temperature: 0.8, penaltyLastN: 64, penaltyRepeat: 1.1, penaltyPresent: 0.0, penaltyFreq: 0.0, mirostat: 0, mirostatTau: 5.0, mirostatEta: 0.1, nCtx: 4096, nBatch: 512, nThread: 8, promptFormat: 2}
Character loaded from MCF
Character reset
{name: Phi-3-mini-4k-instruct-q4.gguf, uri: /data/user/0/com.danemadsen.maid/cache/file_picker/1714279881171/Phi-3-mini-4k-instruct-q4.gguf, token: , randomSeed: true, useDefault: false, penalizeNewline: true, seed: 0, nKeep: 48, nPredict: 256, topK: 40, topP: 0.95, minP: 0.1, tfsZ: 1.0, typicalP: 1.0, temperature: 0.8, penaltyLastN: 64, penaltyRepeat: 1.1, penaltyPresent: 0.0, penaltyFreq: 0.0, mirostat: 0, mirostatTau: 5.0, mirostatEta: 0.1, nCtx: 4096, nBatch: 512, nThread: 8, promptFormat: 2}
File selected: /data/user/0/com.danemadsen.maid/cache/file_picker/1714279945804/Phi-3-mini-4k-instruct-q4.gguf
Loading model from File: '/data/user/0/com.danemadsen.maid/cache/file_picker/1714279945804/Phi-3-mini-4k-instruct-q4.gguf'
Initializing LLM
Model init in 0.130381 seconds
Prompting with llamacpp
n_ctx: 512
n_predict: 256

Apparently upstream llama.cpp support Phi-3 but only 4k context variant for now. https://github.com/ggerganov/llama.cpp/issues/6849

Foul-Tarnished avatar Apr 28 '24 04:04 Foul-Tarnished

Yeah phi 3 definitely works in the latest version. I tested it out late last week

danemadsen avatar Apr 28 '24 13:04 danemadsen

Yeah phi 3 definitely works in the latest version. I tested it out late last week

Latest commit ? cause latest release is 2 weeks old.

And even though it worked directly on llama.cpp, it had issues (end token, infinite generations, and other issues) So llama.cpp lib should be updated to have correct phi-3 usage

Foul-Tarnished avatar May 08 '24 09:05 Foul-Tarnished

Yeah phi 3 definitely works in the latest version. I tested it out late last week

Latest commit ? cause latest release is 2 weeks old.

And even though it worked directly on llama.cpp, it had issues (end token, infinite generations, and other issues) So llama.cpp lib should be updated to have correct phi-3 usage

Yeah that sounds like an ongoing issue that's effecting all models. I know what I need to do to fix it now but I can't until I'm home and I can work on it.

danemadsen avatar May 08 '24 09:05 danemadsen