aub.ai
aub.ai copied to clipboard
Migrate from llama_eval to llama_decode (including llama_batch usage)
Related to #15 #17
I'm updating aub.ai to make use the latest version of llama.cpp. This update has deprecated the llama_eval
function in favor of llama_decode
, which now requires the use of llama_batch
. While I was aware of this upcoming change a while ago, I hadn't had the time to migrate away yet.
This issue has my highest priority, please be patient while I work out some technical difficulties.
At the time of this writing, it's the start of the evening here on a Sunday. I will continue development but I honestly do not think I am able to finish migration, compile/test for each platform and package this up for a release on pub.dev (as the aub_ai
package), neither as an app (e.g. TestFlight) etc. Each step is do-able, but all together always takes quite some time without a proper CI/CD setup (sorry, comes later!). Please wait while I go through these steps, you can follow some of this work here in this branch: https://github.com/BrutalCoding/aub.ai/tree/feature/sync-with-latest-llamacpp
Challenges:
- I've updated my code to use
llama_decode
andllama_batch
, but the AI model is now outputting strange Unicode characters. This indicates an incorrect implementation on my side.
Tasks:
- Review example code utilizing
llama_decode
andllama_batch
within the llama.cpp repository or related projects. - Carefully analyze the differences between how I used
llama_eval
previously and the expected input/output structures forllama_decode
. - Debug and adjust my code to ensure correct tokenization, batching, and handling of model output.
Status update time!
Good news:
- Fixed compiler issues, code been migrated to llama_decode etc 😄
- Gemma works
Bad news
- I lied, kinda. I did migrate the code, but I'm missing a critical step somewhere.
- Assistant no longer generates an answer. The prompt does properly tokenize and the "answer" (exact same prompt / convo) gets decoded with text_to_sentence_piece too but I am missing a step somewhere.
Will jump on this project again this weekend, let's see if I can solve it.