candle icon indicating copy to clipboard operation
candle copied to clipboard

How to run LLama-3 or Phi with more then 4096 prompt tokens?

Open baleksey opened this issue 1 year ago • 0 comments

Could you please show me an example where LLama-3 model used (better GGUF quantized) and initial prompt is more then 4096 tokens long? Or better 16-64K long (for RAG). Currently everything I do ends with error: In this code: let logits = model.forward(&input, 0); // input is > 4096 tokens

Error: narrow invalid args start + len > dim_len: [4096, 64], dim: 0, start: 0, len:4240

Model used: https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF

Thank you a lot in advance!

baleksey avatar May 07 '24 20:05 baleksey