aub.ai icon indicating copy to clipboard operation
aub.ai copied to clipboard

Migrate from llama_eval to llama_decode (including llama_batch usage)

Open BrutalCoding opened this issue 11 months ago • 1 comments

Related to #15 #17

I'm updating aub.ai to make use the latest version of llama.cpp. This update has deprecated the llama_eval function in favor of llama_decode, which now requires the use of llama_batch. While I was aware of this upcoming change a while ago, I hadn't had the time to migrate away yet.

This issue has my highest priority, please be patient while I work out some technical difficulties.

At the time of this writing, it's the start of the evening here on a Sunday. I will continue development but I honestly do not think I am able to finish migration, compile/test for each platform and package this up for a release on pub.dev (as the aub_ai package), neither as an app (e.g. TestFlight) etc. Each step is do-able, but all together always takes quite some time without a proper CI/CD setup (sorry, comes later!). Please wait while I go through these steps, you can follow some of this work here in this branch: https://github.com/BrutalCoding/aub.ai/tree/feature/sync-with-latest-llamacpp

Challenges:

  • I've updated my code to use llama_decode and llama_batch, but the AI model is now outputting strange Unicode characters. This indicates an incorrect implementation on my side.

Tasks:

  • Review example code utilizing llama_decode and llama_batch within the llama.cpp repository or related projects.
  • Carefully analyze the differences between how I used llama_eval previously and the expected input/output structures for llama_decode.
  • Debug and adjust my code to ensure correct tokenization, batching, and handling of model output.

BrutalCoding avatar Mar 03 '24 11:03 BrutalCoding

Status update time!

Good news:

  • Fixed compiler issues, code been migrated to llama_decode etc 😄
  • Gemma works

Bad news

  • I lied, kinda. I did migrate the code, but I'm missing a critical step somewhere.
  • Assistant no longer generates an answer. The prompt does properly tokenize and the "answer" (exact same prompt / convo) gets decoded with text_to_sentence_piece too but I am missing a step somewhere.

Will jump on this project again this weekend, let's see if I can solve it.

BrutalCoding avatar Mar 07 '24 02:03 BrutalCoding