Matthew Steele
Matthew Steele
+1 for this feature
+1 would really like this feature, am doing some object updating with LLMs and was surprised that the bulk updates are actually the bottleneck
@Ko8rah did you ever find a solution? I am having the same problem.
> > > Yes it is. And hf-chat sends that stop token currently. > > > > > > Why does my local deployment of llama3-70b-instruct perform worse than Hugging...
+1 would dramatically simplify my schema and queries.
same here for 3.1-70b. just adding that I'm using AWQ and can only run something like ~23k tokens on 2x a6000 ada (96 GB total VRAM), while using VLLM I...