maid
maid copied to clipboard
Phi-3 models support in local generation (llama.cpp)
Phi-3 mini should be great on mobile, and apparently it's not that stupid for its size
but model doesn't seem to load no real ram usage (12gb, 7gb available)
{name: Phi-3-mini-4k-instruct-q4.gguf, uri: /data/user/0/com.danemadsen.maid/cache/file_picker/1714279881171/Phi-3-mini-4k-instruct-q4.gguf, token: , randomSeed: true, useDefault: false, penalizeNewline: true, seed: 0, nKeep: 48, nPredict: 256, topK: 40, topP: 0.95, minP: 0.1, tfsZ: 1.0, typicalP: 1.0, temperature: 0.8, penaltyLastN: 64, penaltyRepeat: 1.1, penaltyPresent: 0.0, penaltyFreq: 0.0, mirostat: 0, mirostatTau: 5.0, mirostatEta: 0.1, nCtx: 4096, nBatch: 512, nThread: 8, promptFormat: 2}
Character loaded from MCF
Character reset
{name: Phi-3-mini-4k-instruct-q4.gguf, uri: /data/user/0/com.danemadsen.maid/cache/file_picker/1714279881171/Phi-3-mini-4k-instruct-q4.gguf, token: , randomSeed: true, useDefault: false, penalizeNewline: true, seed: 0, nKeep: 48, nPredict: 256, topK: 40, topP: 0.95, minP: 0.1, tfsZ: 1.0, typicalP: 1.0, temperature: 0.8, penaltyLastN: 64, penaltyRepeat: 1.1, penaltyPresent: 0.0, penaltyFreq: 0.0, mirostat: 0, mirostatTau: 5.0, mirostatEta: 0.1, nCtx: 4096, nBatch: 512, nThread: 8, promptFormat: 2}
File selected: /data/user/0/com.danemadsen.maid/cache/file_picker/1714279945804/Phi-3-mini-4k-instruct-q4.gguf
Loading model from File: '/data/user/0/com.danemadsen.maid/cache/file_picker/1714279945804/Phi-3-mini-4k-instruct-q4.gguf'
Initializing LLM
Model init in 0.130381 seconds
Prompting with llamacpp
n_ctx: 512
n_predict: 256
Apparently upstream llama.cpp support Phi-3 but only 4k context variant for now. https://github.com/ggerganov/llama.cpp/issues/6849
Yeah phi 3 definitely works in the latest version. I tested it out late last week
Yeah phi 3 definitely works in the latest version. I tested it out late last week
Latest commit ? cause latest release is 2 weeks old.
And even though it worked directly on llama.cpp, it had issues (end token, infinite generations, and other issues) So llama.cpp lib should be updated to have correct phi-3 usage
Yeah phi 3 definitely works in the latest version. I tested it out late last week
Latest commit ? cause latest release is 2 weeks old.
And even though it worked directly on llama.cpp, it had issues (end token, infinite generations, and other issues) So llama.cpp lib should be updated to have correct phi-3 usage
Yeah that sounds like an ongoing issue that's effecting all models. I know what I need to do to fix it now but I can't until I'm home and I can work on it.