Matthias Reso

Results 92 comments of Matthias Reso

Hi @mhashas sorry for not getting to this sooner. Could you please give a bit more details on how you image that integration? We're using pydantic dataclasses for our vllm...

Hi @james-joobs thanks for reporting your issue. For a bit more context, could you add a complete log showing your error? What platform are you on? Any information on your...

Thanks @james-joobs for the additional information. Now its more clear to me where the issue is. The BaseHandler or a derived class is not executed directly from the cli. Its...

Hi @richardkmichael thanks for the contribution, will go through the detailed changes tonight

Hi @DerrickYLJ in your torchrun call you need to specify the --nproc_per_node to your number of GPU. It will spin up a process for each GPU to split the model.

Sorry @ISADORAyt wasn't paying attention that @DerrickYLJ was loading the 8B model. The code in this repo is only able to load the 8B on a single GPU and the...

> I think that the problem is due to Llama3-8B-Instruct only has one checkpoint file? So how does set nproc_per_node will help, or more specifically, how can we solve this?...

Hi @vandesa003 on the client (l6/locust) side, how many concurrent users/connections do you allow? It looks a bit like you're not providing enough requests to the server and the GPUs...

Hi @mylesgoose I think that could be a great idea. Can you share a bit how the interface would look after this integration?

Great! Could you prepare the checkpointing pieces into a PR. Happy to review this.