dspy icon indicating copy to clipboard operation
dspy copied to clipboard

(Fixable?) 400 Error with vLLM API - extra input

Open ccruttjr opened this issue 1 year ago • 2 comments

Howdy. It seems when running a vLLM server and then attempting to interact with it via HFClientVLLM, I get an error message. Here is how to reproduce:

# Computer 1
pip install ray==2.20.0 vllm==0.4.2 dspy-ai==2.4.9 flash_attn==2.5.8
ray start --head --num-gpus 1
# Computer 2. Address will be Computer 1's IP address
pip install ray==2.20.0 vllm==0.4.2 dspy-ai==2.4.9 flash_attn==2.5.8
ray start --address='192.168.250.20:6379' --num-gpus 1
# Computer 1
python -m vllm.entrypoints.openai.api_server --model facebook/opt-125m --host 0.0.0.0 --port 8000 -tp 2 --seed 42
# Any computer on network (including Computer 1 and 2). Address will be Computer 1's IP address
pip install dspy-ai==2.4.9 # If not already installed
python -c 'import dspy;lm = dspy.HFClientVLLM(model="facebook/opt-125m", port=8000, url="http://192.168.250.20", seed=42);dspy.configure(lm=lm);print(lm("Test"))'

If you only have one computer, this gives same output

pip install ray==2.20.0 vllm==0.4.2 dspy-ai==2.4.9 flash_attn==2.5.8
python -m vllm.entrypoints.openai.api_server --model facebook/opt-125m --host 0.0.0.0 --port 8000 --seed 42
# Different tab
python -c 'import dspy;lm = dspy.HFClientVLLM(model="facebook/opt-125m", port=8000, url="http://localhost", seed=42);dspy.configure(lm=lm);print(lm("Test"))'

This gives me an output of

Failed to parse JSON response: {"object":"error","message":"[{'type': 'extra_forbidden', 'loc': ('body', 'port'), 'msg': 'Extra inputs are not permitted', 'input': 8000, 'url': 'https://errors.pydantic.dev/2.5/v/extra_forbidden'}, {'type': 'extra_forbidden', 'loc': ('body', 'url'), 'msg': 'Extra inputs are not permitted', 'input': ['http://192.168.250.10:8000'], 'url': 'https://errors.pydantic.dev/2.5/v/extra_forbidden'}]","type":"BadRequestError","param":null,"code":400}
Traceback (most recent call last):
  File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf_client.py", line 199, in _generate
    completions = json_response["choices"]
KeyError: 'choices'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf.py", line 190, in __call__
    response = self.request(prompt, **kwargs)
  File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/lm.py", line 26, in request
    return self.basic_request(prompt, **kwargs)
  File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf.py", line 147, in basic_request
    response = self._generate(prompt, **kwargs)
  File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf_client.py", line 208, in _generate
    raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server

Going into dsp/modules/hf_client.py, I tried commenting out this line:

payload = {
    "model": self.kwargs["model"],
    "prompt": prompt,
    # **kwargs, commented this line out!
}

Now, when I run

python -c 'import dspy;lm = dspy.HFClientVLLM(model="facebook/opt-125m", port=8000, url="http://localhost", seed=42);dspy.configure(lm=lm);print(lm("Test"))'

It returns

['osterone, Muscle Trackers, Water Race, Anabolic Shifting, Tally']

yay!

ccruttjr avatar May 09 '24 20:05 ccruttjr

I found this issue while looking for solutions for a similar looking issue, which I am having with latest dspy (2.4.9). However, I didn't find the suggested workaround to have any effect in my scenario. You might want to check whether your issue is already present in dspy 2.4.7.

Wolfsauge avatar May 14 '24 20:05 Wolfsauge

This issue should be solved by https://github.com/stanfordnlp/dspy/pull/1043

tom-doerr avatar May 22 '24 09:05 tom-doerr

TL;DR: I've tried the changes proposed in Issue #1025 and PR #1043 and I can confirm that the fix works with dspy version 2.4.9 and vllm version 0.4.3 at least for the minimal example described below

I was switching from ollama to vLLM in my dspy project and I ended up having the same problem with dspy version 2.4.9 and vllm version 0.4.3. So I tried the bare minimum example you provide in the documentation:

Server:

 pixi r python -m vllm.entrypoints.api_server --trust-remote-code --model meta-llama/Llama-2-7b-hf --port 8081  

Code:

model="meta-llama/Llama-2-7b-hf"
lm = dspy.HFClientVLLM(model=model, port=8081, url="http://localhost")
dspy.configure(lm=lm)
qa = dspy.ChainOfThought('question -> answer')

response = qa(question="What is the capital of Paris?")
print(response.answer)

Output:

Failed to parse JSON response: {"detail":"Not Found"}
Traceback (most recent call last):
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 199, in _generate
    completions = json_response["choices"]
                  ~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'choices'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jupyter/dev/funes/funes/main_test.py", line 23, in <module>
    response = qa(question="What is the capital of Paris?") #Prompted to vllm_llama2
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dspy/predict/predict.py", line 61, in __call__
    return self.forward(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dspy/predict/chain_of_thought.py", line 59, in forward
    return super().forward(signature=signature, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dspy/predict/predict.py", line 103, in forward
    x, C = dsp.generate(template, **config)(x, stage=self.stage)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/primitives/predict.py", line 77, in do_generate
    completions: list[dict[str, Any]] = generator(prompt, **kwargs)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf.py", line 190, in __call__
    response = self.request(prompt, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/lm.py", line 26, in request
    return self.basic_request(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf.py", line 147, in basic_request
    response = self._generate(prompt, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 208, in _generate
    raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server

I've tried to remove the **kwargs params from the payload in the hf_client.py as @ccruttjr suggested, but it doesn't work. Then I've tried the changes proposed in Issue #1025 and PR #1043 and I can confirm that it works.

Hope to see this soon in the new release! Thanks!

Maybe there should be a new DSPy package release using the latest code in main, the PR that fixes this is already merged

tom-doerr avatar Jun 07 '24 18:06 tom-doerr

Maybe there should be a new DSPy package release using the latest code in main, the PR that fixes this is already merged

My dspy version is 2.4.17

and vllm

0.5.4

what fixes this issue @tom-doerr ?

brando90 avatar Sep 14 '24 00:09 brando90

This installs vllm but dspy vllm server fails:

pip install --upgrade pip
pip uninstall torchvision vllm vllm-flash-attn flash-attn xformers
pip install torch==2.2.1 vllm==0.4.1 

@tom-doerr any help?

brando90 avatar Sep 14 '24 00:09 brando90

@brando90 How do you have 2.4.17, isn't the newest version 2.4.16? Do you get the exact same error message? If so, could you post relevant code?

tom-doerr avatar Sep 14 '24 00:09 tom-doerr

@tom-doerr apologies, I don't have access to the bash session when I wrote that message. Likely a typo. I confirm I do have 2.4.16 though:

(uutils) brando9@skampere1~ $ pip list | grep dspy
dspy-ai                                 2.4.16

and vllm version is:

(uutils) brando9@skampere1~ $ pip list | grep vllm
vllm                                    0.4.1
vllm_nccl_cu12                          2.18.1.0.4.0
(uutils) brando9@skampere1~ $ pip list | grep torch
fast-pytorch-kmeans                     0.2.0.1
torch                                   2.2.1

my flash attention doesn't work fyi:

INFO 09-13 18:56:14 selector.py:77] Cannot use FlashAttention backend because the flash_attn package is not found. Please install it for better performance.

thanks for taking the time to respond/help.

brando90 avatar Sep 14 '24 01:09 brando90

Could you post the error message you are getting? Do you use any less commonly used DSPy features?

tom-doerr avatar Sep 14 '24 02:09 tom-doerr

Could you post the error message you are getting? Do you use any less commonly used DSPy features?

@tom-doerr happy to help!

(snap_cluster_setup_py311) brando9@skampere1~ $ conda activate uutils
(uutils) brando9@skampere1~ $ python ~/ultimate-utils/py_src/uutils/dspy_uu/examples/full_toy_vllm_local_mdl.py

  0%|                                                                                                     | 0/3 [00:00<?, ?it/s]Failed to parse JSON response: {"detail":"Not Found"}
2024-09-14T02:04:17.834140Z [error    ] Failed to run or to evaluate example Example({'question': 'What is the capital of France?', 'answer': 'Paris'}) (input_keys={'question'}) with <function exact_match_metric at 0x7f024e0cc0e0> due to Received invalid JSON response from server. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=211
Failed to parse JSON response: {"detail":"Not Found"}
2024-09-14T02:04:17.834909Z [error    ] Failed to run or to evaluate example Example({'question': "Who wrote '1984'?", 'answer': 'George Orwell'}) (input_keys={'question'}) with <function exact_match_metric at 0x7f024e0cc0e0> due to Received invalid JSON response from server. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=211
Failed to parse JSON response: {"detail":"Not Found"}
2024-09-14T02:04:17.835576Z [error    ] Failed to run or to evaluate example Example({'question': 'What is the boiling point of water?', 'answer': '100°C'}) (input_keys={'question'}) with <function exact_match_metric at 0x7f024e0cc0e0> due to Received invalid JSON response from server. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=211
100%|████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 633.96it/s]
Bootstrapped 0 full traces after 3 examples in round 0.
Failed to parse JSON response: {"detail":"Not Found"}
Traceback (most recent call last):
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 243, in _generate
    completions = json_response["choices"]
                  ~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'choices'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lfs/skampere1/0/brando9/ultimate-utils/py_src/uutils/dspy_uu/examples/full_toy_vllm_local_mdl.py", line 65, in <module>
    pred = compiled_simple_qa(my_question)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/primitives/program.py", line 26, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/ultimate-utils/py_src/uutils/dspy_uu/examples/full_toy_vllm_local_mdl.py", line 50, in forward
    prediction = self.generate_answer(question=question)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/primitives/program.py", line 26, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/chain_of_thought.py", line 36, in forward
    return self._predict(signature=signature, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/predict.py", line 91, in __call__
    return self.forward(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/predict.py", line 128, in forward
    completions = old_generate(demos, signature, kwargs, config, self.lm, self.stage)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/predict.py", line 155, in old_generate
    x, C = dsp.generate(template, **config)(x, stage=stage)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/primitives/predict.py", line 73, in do_generate
    completions: list[dict[str, Any]] = generator(prompt, **kwargs)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf.py", line 193, in __call__
    response = self.request(prompt, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/lm.py", line 27, in request
    return self.basic_request(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf.py", line 147, in basic_request
    response = self._generate(prompt, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 252, in _generate
    raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server

brando90 avatar Sep 14 '24 02:09 brando90

vllm server running on the side:

(snap_cluster_setup_py311) brando9@skampere1~ $ conda activate uutils
(uutils) brando9@skampere1~ $ python -m vllm.entrypoints.api_server --model meta-llama/Llama-2-7b-hf --port 8080

INFO 09-13 19:04:13 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='meta-llama/Llama-2-7b-hf', speculative_config=None, tokenizer='meta-llama/Llama-2-7b-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)
INFO 09-13 19:04:13 utils.py:608] Found nccl from library /lfs/skampere1/0/brando9/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 09-13 19:04:14 selector.py:77] Cannot use FlashAttention backend because the flash_attn package is not found. Please install it for better performance.
INFO 09-13 19:04:14 selector.py:33] Using XFormers backend.
INFO 09-13 19:04:15 weight_utils.py:193] Using model weights format ['*.safetensors']
INFO 09-13 19:04:17 model_runner.py:173] Loading model weights took 12.5523 GB
INFO 09-13 19:04:18 gpu_executor.py:119] # GPU blocks: 7406, # CPU blocks: 512
INFO 09-13 19:04:19 model_runner.py:976] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 09-13 19:04:19 model_runner.py:980] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 09-13 19:04:23 model_runner.py:1057] Graph capturing finished in 4 secs.
INFO:     Started server process [348702]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

brando90 avatar Sep 14 '24 02:09 brando90

source code here: https://github.com/brando90/ultimate-utils/blob/master/experiments/experiments/2024/september/09_13_2024.md

brando90 avatar Sep 14 '24 02:09 brando90

Don't see how that's related to the 400 Error. I had the same issue you seem to have: https://github.com/stanfordnlp/dspy/issues/1041 This issue also seems to be relevant: https://github.com/stanfordnlp/dspy/issues/1242

Don't have a solution for you. You could switch to a different model, check if someone else posted a solution to your problem (there are quite a few issues related to this) or switch to the experimental new DSPy 2.5 which has a new backend.

tom-doerr avatar Sep 14 '24 02:09 tom-doerr

Don't see how that's related to the 400 Error. I had the same issue you seem to have: #1041 This issue also seems to be relevant: #1242

Darn embarrassing, apologies. I must admit I've been quite sleep deprived and commented on the wrong issue.

How do I install 2.5?

Can't find it here https://github.com/stanfordnlp/dspy

Thanks for all the help btw!

brando90 avatar Sep 14 '24 15:09 brando90