ray-llm
ray-llm copied to clipboard
Error: Tokenizer class does not exist when load local model
When I try to load a local model, an error raised: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.
I have set trust_remote_code
=True.
I used to use vllm on this model, and it works well.
(ServeController pid=67769) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::ServeReplica:ray-llm-myapp-baichuan:VLLMDeployment:pai--myapp-baichuan2-13b-chat-2.initialize_and_get_metadata() (pid=72307, ip=172.17.0.2, actor_id=438f9032f1ec94c824d6519d01000000, repr=<ray.serve._private.replica.ServeReplica:ray-llm-myapp-baichuan:VLLMDeployment:pai--myapp-baichuan2-13b-chat-2 object at 0x7f4e97177c70>) (ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 439, in result (ServeController pid=67769) return self.__get_result() (ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result (ServeController pid=67769) raise self._exception (ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata
(ServeController pid=67769) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=67769) RuntimeError: Traceback (most recent call last):
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata
(ServeController pid=67769) await self._initialize_replica()
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica
(ServeController pid=67769) await sync_to_async(_callable.__init__)(*init_args, **init_kwargs)
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/server/vllm/vllm_deployment.py", line 37, in __init__
(ServeController pid=67769) await self.engine.start()
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/llm/vllm/vllm_engine.py", line 78, in start
(ServeController pid=67769) pg, runtime_env = await self.node_initializer.initialize_node(self.llm_app)
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/llm/vllm/vllm_node_initializer.py", line 52, in initialize_node
(ServeController pid=67769) await self._initialize_local_node(engine_config)
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/thread.py", line 58, in run
(ServeController pid=67769) result = self.fn(*self.args, **self.kwargs)
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/llm/vllm/vllm_node_initializer.py", line 72, in _initialize_local_node
(ServeController pid=67769) _ = AutoTokenizer.from_pretrained(engine_config.actual_hf_model_id)
(ServeController pid=67769) File "/home/ray/anaconda3/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 748, in from_pretrained
(ServeController pid=67769) raise ValueError(
(ServeController pid=67769) ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.
model yaml
enabled: true
deployment_config:
autoscaling_config:
min_replicas: 1
initial_replicas: 1
max_replicas: 2
target_num_ongoing_requests_per_replica: 1.0
metrics_interval_s: 10.0
look_back_period_s: 30.0
smoothing_factor: 1.0
downscale_delay_s: 300.0
upscale_delay_s: 90.0
ray_actor_options:
num_cpus: 4
engine_config:
model_id: pai/myapp-baichuan2-13b-chat-2
hf_model_id: /opt/models/myapp-baichuan2-13b-chat-2/
engine_kwargs:
trust_remote_code: true
runtime_env:
env_vars:
YOUR_ENV_VAR: "your_value"
generation:
prompt_format:
system: "{instruction}\n" # System message. Will default to default_system_message
assistant: "### Response:\n{instruction}\n" # Past assistant message. Used in chat completions API.
trailing_assistant: "### Response:\n" # New assistant message. After this point, model will generate tokens.
user: "### Instruction:\n{instruction}\n" # User message.
default_system_message: "Below is an instruction that describes a task. Write a response that appropriately completes the request." # Default system message.
system_in_user: false # Whether the system prompt is inside the user prompt. If true, the user field should include '{system}'
add_system_tags_even_if_message_is_empty: false # Whether to include the system tags even if the user message is empty.
strip_whitespace: false # Whether to automaticall strip whitespace from left and right of user supplied messages for chat completions
stopping_sequences: ["### Response:", "### End"]
scaling_config:
num_workers: 1
num_gpus_per_worker: 1
num_cpus_per_worker: 4
serve config:
applications:
- name: ray-llm-myapp-baichuan
route_prefix: /
import_path: rayllm.backend:router_application
args:
models:
- "/data/ray-llm/serve_configs/baichuan2-13b-chat.yaml"
update transformers and try again
pip install transformers -U