dspy
dspy copied to clipboard
(Fixable?) 400 Error with vLLM API - extra input
Howdy. It seems when running a vLLM server and then attempting to interact with it via HFClientVLLM, I get an error message. Here is how to reproduce:
# Computer 1
pip install ray==2.20.0 vllm==0.4.2 dspy-ai==2.4.9 flash_attn==2.5.8
ray start --head --num-gpus 1
# Computer 2. Address will be Computer 1's IP address
pip install ray==2.20.0 vllm==0.4.2 dspy-ai==2.4.9 flash_attn==2.5.8
ray start --address='192.168.250.20:6379' --num-gpus 1
# Computer 1
python -m vllm.entrypoints.openai.api_server --model facebook/opt-125m --host 0.0.0.0 --port 8000 -tp 2 --seed 42
# Any computer on network (including Computer 1 and 2). Address will be Computer 1's IP address
pip install dspy-ai==2.4.9 # If not already installed
python -c 'import dspy;lm = dspy.HFClientVLLM(model="facebook/opt-125m", port=8000, url="http://192.168.250.20", seed=42);dspy.configure(lm=lm);print(lm("Test"))'
If you only have one computer, this gives same output
pip install ray==2.20.0 vllm==0.4.2 dspy-ai==2.4.9 flash_attn==2.5.8
python -m vllm.entrypoints.openai.api_server --model facebook/opt-125m --host 0.0.0.0 --port 8000 --seed 42
# Different tab
python -c 'import dspy;lm = dspy.HFClientVLLM(model="facebook/opt-125m", port=8000, url="http://localhost", seed=42);dspy.configure(lm=lm);print(lm("Test"))'
This gives me an output of
Failed to parse JSON response: {"object":"error","message":"[{'type': 'extra_forbidden', 'loc': ('body', 'port'), 'msg': 'Extra inputs are not permitted', 'input': 8000, 'url': 'https://errors.pydantic.dev/2.5/v/extra_forbidden'}, {'type': 'extra_forbidden', 'loc': ('body', 'url'), 'msg': 'Extra inputs are not permitted', 'input': ['http://192.168.250.10:8000'], 'url': 'https://errors.pydantic.dev/2.5/v/extra_forbidden'}]","type":"BadRequestError","param":null,"code":400}
Traceback (most recent call last):
File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf_client.py", line 199, in _generate
completions = json_response["choices"]
KeyError: 'choices'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf.py", line 190, in __call__
response = self.request(prompt, **kwargs)
File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/lm.py", line 26, in request
return self.basic_request(prompt, **kwargs)
File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf.py", line 147, in basic_request
response = self._generate(prompt, **kwargs)
File "/home/daimyollc/anaconda3/envs/dspyVenv/lib/python3.10/site-packages/dsp/modules/hf_client.py", line 208, in _generate
raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server
Going into dsp/modules/hf_client.py, I tried commenting out this line:
payload = {
"model": self.kwargs["model"],
"prompt": prompt,
# **kwargs, commented this line out!
}
Now, when I run
python -c 'import dspy;lm = dspy.HFClientVLLM(model="facebook/opt-125m", port=8000, url="http://localhost", seed=42);dspy.configure(lm=lm);print(lm("Test"))'
It returns
['osterone, Muscle Trackers, Water Race, Anabolic Shifting, Tally']
yay!
I found this issue while looking for solutions for a similar looking issue, which I am having with latest dspy (2.4.9). However, I didn't find the suggested workaround to have any effect in my scenario. You might want to check whether your issue is already present in dspy 2.4.7.
This issue should be solved by https://github.com/stanfordnlp/dspy/pull/1043
TL;DR: I've tried the changes proposed in Issue #1025 and PR #1043 and I can confirm that the fix works with dspy version 2.4.9 and vllm version 0.4.3 at least for the minimal example described below
I was switching from ollama to vLLM in my dspy project and I ended up having the same problem with dspy version 2.4.9 and vllm version 0.4.3. So I tried the bare minimum example you provide in the documentation:
Server:
pixi r python -m vllm.entrypoints.api_server --trust-remote-code --model meta-llama/Llama-2-7b-hf --port 8081
Code:
model="meta-llama/Llama-2-7b-hf"
lm = dspy.HFClientVLLM(model=model, port=8081, url="http://localhost")
dspy.configure(lm=lm)
qa = dspy.ChainOfThought('question -> answer')
response = qa(question="What is the capital of Paris?")
print(response.answer)
Output:
Failed to parse JSON response: {"detail":"Not Found"}
Traceback (most recent call last):
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 199, in _generate
completions = json_response["choices"]
~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'choices'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jupyter/dev/funes/funes/main_test.py", line 23, in <module>
response = qa(question="What is the capital of Paris?") #Prompted to vllm_llama2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dspy/predict/predict.py", line 61, in __call__
return self.forward(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dspy/predict/chain_of_thought.py", line 59, in forward
return super().forward(signature=signature, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dspy/predict/predict.py", line 103, in forward
x, C = dsp.generate(template, **config)(x, stage=self.stage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/primitives/predict.py", line 77, in do_generate
completions: list[dict[str, Any]] = generator(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf.py", line 190, in __call__
response = self.request(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/lm.py", line 26, in request
return self.basic_request(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf.py", line 147, in basic_request
response = self._generate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jupyter/dev/funes/.pixi/envs/default/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 208, in _generate
raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server
I've tried to remove the **kwargs params from the payload in the hf_client.py as @ccruttjr suggested, but it doesn't work.
Then I've tried the changes proposed in Issue #1025 and PR #1043 and I can confirm that it works.
Hope to see this soon in the new release! Thanks!
Maybe there should be a new DSPy package release using the latest code in main, the PR that fixes this is already merged
Maybe there should be a new DSPy package release using the latest code in main, the PR that fixes this is already merged
My dspy version is 2.4.17
and vllm
0.5.4
what fixes this issue @tom-doerr ?
This installs vllm but dspy vllm server fails:
pip install --upgrade pip
pip uninstall torchvision vllm vllm-flash-attn flash-attn xformers
pip install torch==2.2.1 vllm==0.4.1
@tom-doerr any help?
@brando90 How do you have 2.4.17, isn't the newest version 2.4.16? Do you get the exact same error message? If so, could you post relevant code?
@tom-doerr apologies, I don't have access to the bash session when I wrote that message. Likely a typo. I confirm I do have 2.4.16 though:
(uutils) brando9@skampere1~ $ pip list | grep dspy
dspy-ai 2.4.16
and vllm version is:
(uutils) brando9@skampere1~ $ pip list | grep vllm
vllm 0.4.1
vllm_nccl_cu12 2.18.1.0.4.0
(uutils) brando9@skampere1~ $ pip list | grep torch
fast-pytorch-kmeans 0.2.0.1
torch 2.2.1
my flash attention doesn't work fyi:
INFO 09-13 18:56:14 selector.py:77] Cannot use FlashAttention backend because the flash_attn package is not found. Please install it for better performance.
thanks for taking the time to respond/help.
Could you post the error message you are getting? Do you use any less commonly used DSPy features?
Could you post the error message you are getting? Do you use any less commonly used DSPy features?
@tom-doerr happy to help!
(snap_cluster_setup_py311) brando9@skampere1~ $ conda activate uutils
(uutils) brando9@skampere1~ $ python ~/ultimate-utils/py_src/uutils/dspy_uu/examples/full_toy_vllm_local_mdl.py
0%| | 0/3 [00:00<?, ?it/s]Failed to parse JSON response: {"detail":"Not Found"}
2024-09-14T02:04:17.834140Z [error ] Failed to run or to evaluate example Example({'question': 'What is the capital of France?', 'answer': 'Paris'}) (input_keys={'question'}) with <function exact_match_metric at 0x7f024e0cc0e0> due to Received invalid JSON response from server. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=211
Failed to parse JSON response: {"detail":"Not Found"}
2024-09-14T02:04:17.834909Z [error ] Failed to run or to evaluate example Example({'question': "Who wrote '1984'?", 'answer': 'George Orwell'}) (input_keys={'question'}) with <function exact_match_metric at 0x7f024e0cc0e0> due to Received invalid JSON response from server. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=211
Failed to parse JSON response: {"detail":"Not Found"}
2024-09-14T02:04:17.835576Z [error ] Failed to run or to evaluate example Example({'question': 'What is the boiling point of water?', 'answer': '100°C'}) (input_keys={'question'}) with <function exact_match_metric at 0x7f024e0cc0e0> due to Received invalid JSON response from server. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=211
100%|████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 633.96it/s]
Bootstrapped 0 full traces after 3 examples in round 0.
Failed to parse JSON response: {"detail":"Not Found"}
Traceback (most recent call last):
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 243, in _generate
completions = json_response["choices"]
~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'choices'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lfs/skampere1/0/brando9/ultimate-utils/py_src/uutils/dspy_uu/examples/full_toy_vllm_local_mdl.py", line 65, in <module>
pred = compiled_simple_qa(my_question)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/primitives/program.py", line 26, in __call__
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/ultimate-utils/py_src/uutils/dspy_uu/examples/full_toy_vllm_local_mdl.py", line 50, in forward
prediction = self.generate_answer(question=question)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/primitives/program.py", line 26, in __call__
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/chain_of_thought.py", line 36, in forward
return self._predict(signature=signature, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/predict.py", line 91, in __call__
return self.forward(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/predict.py", line 128, in forward
completions = old_generate(demos, signature, kwargs, config, self.lm, self.stage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dspy/predict/predict.py", line 155, in old_generate
x, C = dsp.generate(template, **config)(x, stage=stage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/primitives/predict.py", line 73, in do_generate
completions: list[dict[str, Any]] = generator(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf.py", line 193, in __call__
response = self.request(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/lm.py", line 27, in request
return self.basic_request(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf.py", line 147, in basic_request
response = self._generate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/brando9/miniconda/envs/uutils/lib/python3.11/site-packages/dsp/modules/hf_client.py", line 252, in _generate
raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server
vllm server running on the side:
(snap_cluster_setup_py311) brando9@skampere1~ $ conda activate uutils
(uutils) brando9@skampere1~ $ python -m vllm.entrypoints.api_server --model meta-llama/Llama-2-7b-hf --port 8080
INFO 09-13 19:04:13 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='meta-llama/Llama-2-7b-hf', speculative_config=None, tokenizer='meta-llama/Llama-2-7b-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)
INFO 09-13 19:04:13 utils.py:608] Found nccl from library /lfs/skampere1/0/brando9/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 09-13 19:04:14 selector.py:77] Cannot use FlashAttention backend because the flash_attn package is not found. Please install it for better performance.
INFO 09-13 19:04:14 selector.py:33] Using XFormers backend.
INFO 09-13 19:04:15 weight_utils.py:193] Using model weights format ['*.safetensors']
INFO 09-13 19:04:17 model_runner.py:173] Loading model weights took 12.5523 GB
INFO 09-13 19:04:18 gpu_executor.py:119] # GPU blocks: 7406, # CPU blocks: 512
INFO 09-13 19:04:19 model_runner.py:976] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 09-13 19:04:19 model_runner.py:980] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 09-13 19:04:23 model_runner.py:1057] Graph capturing finished in 4 secs.
INFO: Started server process [348702]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
source code here: https://github.com/brando90/ultimate-utils/blob/master/experiments/experiments/2024/september/09_13_2024.md
Don't see how that's related to the 400 Error. I had the same issue you seem to have: https://github.com/stanfordnlp/dspy/issues/1041 This issue also seems to be relevant: https://github.com/stanfordnlp/dspy/issues/1242
Don't have a solution for you. You could switch to a different model, check if someone else posted a solution to your problem (there are quite a few issues related to this) or switch to the experimental new DSPy 2.5 which has a new backend.
Don't see how that's related to the 400 Error. I had the same issue you seem to have: #1041 This issue also seems to be relevant: #1242
Darn embarrassing, apologies. I must admit I've been quite sleep deprived and commented on the wrong issue.
How do I install 2.5?
Can't find it here https://github.com/stanfordnlp/dspy
Thanks for all the help btw!