vllm Isn't vllm compatible with other web frameworks?

Dear author, At the bottom I used the flask framework to build the web service, but there were some strange phenomena (below). Isn't vllm compatible with other web frameworks?

phenomena : ①The first request to enter can be reasoned normally; ②Second request, vllm directly calls abort_request; ③ After subsequent requests are Received, vllm prints a log saying Received request, but no more inference is performed.

MY ENV: flask rtx4090 vllm0.2.0 python3.8 cuda11.8

Oct 27 '23 02:10 ming-shy

same question

Oct 30 '23 08:10 aixiaodewugege

same question

flask is acceptable for vllm0.1.4, but not for later versions. The core code of vllm has changed significantly since 0.1.4, so later versions may not be compatible with other web synchronization frameworks.

Oct 30 '23 09:10 ming-shy

same question

flask is acceptable for vllm0.1.4, but not for later versions. The core code of vllm has changed significantly since 0.1.4, so later versions may not be compatible with other web synchronization frameworks.

Thank you! Do you think it's possible to use Flask to call the VLLM API?

Oct 30 '23 09:10 aixiaodewugege

same question

flask is acceptable for vllm0.1.4, but not for later versions. The core code of vllm has changed significantly since 0.1.4, so later versions may not be compatible with other web synchronization frameworks.

Thank you! Do you think it's possible to use Flask to call the VLLM API?

vllm returns asynchronous iterators, and even if you call vllm's api, what you receive is still an asynchronous iterator, and you need flask to process it. So I personally don't think that's feasible.

Oct 30 '23 09:10 ming-shy

Understood. Beyond just using its API, is there another way to integrate vllm? I aim to deploy it on my server and provide services via an iOS app.

Oct 30 '23 09:10 aixiaodewugege

Understood. Beyond just using its API, is there another way to integrate vllm? I aim to deploy it on my server and provide services via an iOS app.

of course you can. you can refer: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/api_server.py

Oct 30 '23 11:10 mengban

As far as I know, FastChat is compatible with OpenAI API.

Nov 01 '23 12:11 gesanqiu

Guys I need a help regarding this...

I am willing to integrate VLLM in myproject and want to serve as an API.
Before this I was using Flask with llama_cpp_python but that currently doesn't support the batched inference.
So, when we make two parallel request on my flask API, it crashes (as expected)
I am little unsure about how to do this with VLLM.

I am thinking of the following.

I think, on my Linux server, I should run the api_server.py on the local IP and port.

Then I think I will be publically be able to access that URL from my application.

My question is: I that recommended approach? I know it sounds obvious, but I am not very experienced with web serving so will you guys please help me out here? This way I won't have to use Flask.

Thank you 🙏🏻

Jan 08 '24 12:01 AayushSameerShah

vllm vllm copied to clipboard

Isn't vllm compatible with other web frameworks?

I am thinking of the following.

vllm
vllm copied to clipboard