Prompt token count (3784) exceeds batch capacity (2048)
[ERROR:flutter/runtime/dart_vm_initializer.cc(40)] Unhandled Exception: Generation error: LlamaException: Prompt token count (3784) exceeds batch capacity (2048)
Why is it doing this? The context size is sufficiently high. Batch size just determines how much should be in each batch. If there are more, it should run multiple batches. In the Llama.cpp CLI it usually works fine, right? Does this mean we need to set the batch size to the max context length to even get it to work? What's going on?
yes that correct, please look at examples folder
yes that correct, please look at examples folder
I believe you should be able to set context size higher than batch capacity and still be able to utilize the full context, right? If not, please explain.