Hi there,

I had a question regarding working with the API Server from the instructions here.

I am running this after running the docker command

Pull the Docker image with CUDA 11.8.

docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3 and running these commands within the workspace

pip uninstall torch
pip install vllm

When running the default command python -m vllm.entrypoints.api_server, the server doesn't connect, returning

INFO: Started server process [3820] INFO: Waiting for application startup. INFO: Application startup complete. ERROR: [Errno 99] error while attempting to bind on address ('::1', 8000, 0, 0): cannot assign requested address INFO: Waiting for application shutdown. INFO: Application shutdown complete.

I try different ports and it still doesn't work.

However, when I try a different host like 127.0.0.1 for instance through the --host parameter, it connects to the server and runs on Uvicorn but I am unable to curl requests to this server.

I try to do this by opening a 2nd terminal, running the docker image and the rest of the commands above again and then trying to run both of these curl requests but both are unable to connect returning: curl: (7) Failed to connect to 127.0.0.1 port 8000: Connection refused. I also tried them outside of the workspace and the requests still fail.

curl -X POST http://127.0.0.1:8000/generate -H "Content-Type: application/json" -d '{ "prompt": "San Francisco is a", "use_beam_search": true, "n": 4, "temperature": 0 }' and curl -X POST http://localhost:8000/generate -H "Content-Type: application/json" -d '{ "prompt": "San Francisco is a", "use_beam_search": true, "n": 4, "temperature": 0 }'

This also applies to the OpenAI server, so it would be great to receive some support on this to send requests to the server. Appreciate the help!

Jun 23 '23 05:06 arnavsinghvi11

This looks like a docker port exporting issue. Can you check whether you exposed port 8000 of your docker to your host? Reference. Or can you try to run curl within the docker?

Jun 23 '23 08:06 zhuohan123

Hi,

Thanks for your response. Yeah so port 8000 was exposed and I tried running the curl commands with the docker but it still was not able to find the host.

I'm wondering if this command python -m vllm.entrypoints.api_server is working as expected as it was not for me.

Thanks!

Jun 23 '23 23:06 arnavsinghvi11

Hi,

This worked for me:

python -m vllm.entrypoints.api_server --host 0.0.0.0 --port 8000

And when you run the image, add the port as well:

docker run --gpus all -it --rm -p 8000:8000 --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3

Jun 28 '23 21:06 CarlosS7

Hi Carlos,

Thanks for the help! I was able to connect to the server. However, I noticed that despite setting --gpus all, when I ran nvidia-smi on another terminal, it wasn't using all the gpus. is there another arg to pass in for ensuring the models on the vllm server use all gpus?

Jun 29 '23 04:06 arnavsinghvi11

Glad it helped. Sorry, I only have 1 GPU, so I cannot run that test.

Jun 29 '23 15:06 CarlosS7

Close this issue since it's a docker issue instead of a vLLM issue. Feel free to re-open if there any further issues.

Jul 18 '23 04:07 zhuohan123

vllm
vllm copied to clipboard

Curl requests not working

Pull the Docker image with CUDA 11.8.

vllm vllm copied to clipboard

Curl requests not working

Pull the Docker image with CUDA 11.8.

vllm
vllm copied to clipboard