llama.cpp
llama.cpp copied to clipboard
examples : evaluate tokens in batches after swapping context
This new loop around llama_eval is a bit redundant with the batching done in the main loop, but without a refactor it's all still necessary to keep print statements happening at the right times.
Tests passed yesterday. I just synced recent changes and added a comment.