openchat
openchat copied to clipboard
crash in VLLM
Trying to install it to NVidia's pytorch contaner. I'm getting this while running. Same issue while trying to install it to Lambda GPU cloud on H100 instance. (all default)
root@0971a018b7ec:/workspace/openchat# python -m ochat.serving.openai_api_server --model_type openchat_v2 --model openchat/openchat_v2_w --engine-use-ray --worker-use-ray
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/openchat/ochat/serving/openai_api_server.py", line 21, in <module>
from vllm.engine.arg_utils import AsyncEngineArgs
File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 4, in <module>
from vllm.engine.async_llm_engine import AsyncLLMEngine
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 7, in <module>
from vllm.engine.llm_engine import LLMEngine
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 16, in <module>
from vllm.worker.worker import Worker
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 8, in <module>
from vllm.model_executor import get_model, InputMetadata, set_random_seed
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/__init__.py", line 2, in <module>
from vllm.model_executor.model_loader import get_model
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 9, in <module>
from vllm.model_executor.models import * # pylint: disable=wildcard-import
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 1, in <module>
from vllm.model_executor.models.bloom import BloomForCausalLM
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/bloom.py", line 31, in <module>
from vllm.model_executor.layers.activation import get_act_fn
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/activation.py", line 5, in <module>
from vllm import activation_ops
ImportError: /usr/local/lib/python3.10/dist-packages/vllm/activation_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I get another error:
2023-07-18 15:53:21,350 ERROR services.py:1207 -- Failed to start the dashboard , return code 1
2023-07-18 15:53:21,350 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-07-18 15:53:21,350 ERROR services.py:1276 --
The last 20 lines of /tmp/ray/session_2023-07-18_15-53-19_820841_46100/logs/dashboard.log (it contains the error message from the dashboard):
File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/dashboard/modules/log/log_manager.py", line 8, in <module>
from ray.util.state.common import (
File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/__init__.py", line 1, in <module>
from ray.util.state.api import (
File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/api.py", line 17, in <module>
from ray.util.state.common import (
File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/common.py", line 120, in <module>
@dataclass(init=True)
File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/pydantic/dataclasses.py", line 139, in dataclass
assert init is False, 'pydantic.dataclasses.dataclass only supports init=False'
AssertionError: pydantic.dataclasses.dataclass only supports init=False
2023-07-18 15:53:21,467 INFO worker.py:1636 -- Started a local Ray instance.
[2023-07-18 15:53:22,307 E 46100 46100] core_worker.cc:193: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
Have you fixed the issue? We released a new version and tested the following setup:
conda create -y --name openchat
conda activate openchat
conda install -y python
conda install -y cudatoolkit-dev -c conda-forge
pip3 install torch torchvision torchaudio
pip3 install packaging ninja
pip3 install --no-build-isolation "flash-attn<2"
pip3 install ochat
I get another error:
2023-07-18 15:53:21,350 ERROR services.py:1207 -- Failed to start the dashboard , return code 1 2023-07-18 15:53:21,350 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is. 2023-07-18 15:53:21,350 ERROR services.py:1276 -- The last 20 lines of /tmp/ray/session_2023-07-18_15-53-19_820841_46100/logs/dashboard.log (it contains the error message from the dashboard): File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/dashboard/modules/log/log_manager.py", line 8, in <module> from ray.util.state.common import ( File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/__init__.py", line 1, in <module> from ray.util.state.api import ( File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/api.py", line 17, in <module> from ray.util.state.common import ( File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/common.py", line 120, in <module> @dataclass(init=True) File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/pydantic/dataclasses.py", line 139, in dataclass assert init is False, 'pydantic.dataclasses.dataclass only supports init=False' AssertionError: pydantic.dataclasses.dataclass only supports init=False 2023-07-18 15:53:21,467 INFO worker.py:1636 -- Started a local Ray instance. [2023-07-18 15:53:22,307 E 46100 46100] core_worker.cc:193: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
Any update for this issue? I met this problem too.
@imoneoi given that OpenChat adds a special token, the same changes has to be made in vllm right? I believe vllm uses the default hugging face model and tokenizer.
Did OpenChat integrate with vllm?