Matthias Reso
Matthias Reso
Hi @mhashas sorry for not getting to this sooner. Could you please give a bit more details on how you image that integration? We're using pydantic dataclasses for our vllm...
Hi @james-joobs thanks for reporting your issue. For a bit more context, could you add a complete log showing your error? What platform are you on? Any information on your...
Thanks @james-joobs for the additional information. Now its more clear to me where the issue is. The BaseHandler or a derived class is not executed directly from the cli. Its...
Hi @richardkmichael thanks for the contribution, will go through the detailed changes tonight
Hi @DerrickYLJ in your torchrun call you need to specify the --nproc_per_node to your number of GPU. It will spin up a process for each GPU to split the model.
Sorry @ISADORAyt wasn't paying attention that @DerrickYLJ was loading the 8B model. The code in this repo is only able to load the 8B on a single GPU and the...
> I think that the problem is due to Llama3-8B-Instruct only has one checkpoint file? So how does set nproc_per_node will help, or more specifically, how can we solve this?...
Hi @vandesa003 on the client (l6/locust) side, how many concurrent users/connections do you allow? It looks a bit like you're not providing enough requests to the server and the GPUs...
Hi @mylesgoose I think that could be a great idea. Can you share a bit how the interface would look after this integration?
Great! Could you prepare the checkpointing pieces into a PR. Happy to review this.