Pierre Janeke
Pierre Janeke
I see 2. was fixed with https://github.com/sgl-project/sglang/commit/b0890631a011be28d5ef5a0b4d5551fdeb94ab25
Does this mean the problem with 1. is fixed @merrymercy?
Any progress on this?
@rlouf did you manage to make much progress yet?
I had a similar problem running on an EC2 g5.2xlarge instance (1 x A10G) using openchat/openchat3.5-0106. I have long sequences (6-7k tokens). A batch size of 19 sequences is fine,...
@hnyls2002 is it possible to launch 8 servers (one for each GPU) on a single machine with 8GPUs?
I know this results in a full copy of the model being on each machine, but that is ideal for my use case. Apparently, you can do it with vllm...
Is this happening soon?
@MightyGoldenJA I think you can use the outlines integration in vllm and pass it as an argument to the vllm integration in langchain (I hope I used the right phrasing)....
I am not very familiar with these libraries but how about what [aiopath](https://github.com/alexdelorenzo/aiopath) and [aiobotocore](https://github.com/aio-libs/aiobotocore) did? Perhaps they could be a source of inspiration if someone is willing to put...