inference
inference copied to clipboard
BUG for loading BaichuanPreTrainedModel
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
To help us to reproduce this bug, please provide information below:
- Your Python version: python3.10
- The version of xinference you use: xinference[transformers]==0.6.5
- Versions of crucial packages: transformers==4.35.2 ; transformers-stream-generator==0.0.4
- Full stack of the error: (Xinference) root@llm-gpu-1:/export2/xinference/cache# XINFERENCE_HOME=/export2/xinference xinference-local --host 0.0.0.0 --port 9997 2023-12-04 22:19:03,074 - modelscope - INFO - PyTorch version 2.1.1 Found. 2023-12-04 22:19:03,075 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-12-04 22:19:03,115 - modelscope - INFO - Loading done! Current index file version is 1.9.5, with md5 9fe5fb456640f68ca2da267999aff888 and a total number of 945 components indexed 2023-12-04 22:19:03,185 xinference.core.supervisor 1349396 INFO Xinference supervisor 0.0.0.0:33819 started 2023-12-04 22:19:03,211 xinference.core.worker 1349396 INFO Xinference worker 0.0.0.0:33819 started 2023-12-04 22:19:03,211 xinference.core.worker 1349396 INFO Purge cache directory: /export2/xinference/cache 2023-12-04 22:19:08,253 xinference.api.restful_api 1349359 INFO Starting Xinference at endpoint: http://0.0.0.0:9997 2023-12-04 22:19:15,260 - modelscope - INFO - PyTorch version 2.1.1 Found. 2023-12-04 22:19:15,261 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-12-04 22:19:15,301 - modelscope - INFO - Loading done! Current index file version is 1.9.5, with md5 9fe5fb456640f68ca2da267999aff888 and a total number of 945 components indexed 2023-12-04 22:19:15,378 xinference.model.llm.llm_family 1349396 INFO Caching from Modelscope: baichuan-inc/Baichuan2-13B-Chat 2023-12-04 22:19:15,787 - modelscope - INFO - Use user-specified model revision: v1.0.3 Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 716/716 [00:00<00:00, 3.35MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 217/217 [00:00<00:00, 1.05MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 7.60MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 285/285 [00:00<00:00, 1.60MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 2.90k/2.90k [00:00<00:00, 14.9MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 31.8k/31.8k [00:00<00:00, 4.03MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3.28k/3.28k [00:00<00:00, 16.2MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████▉| 9.29G/9.29G [07:00<00:00, 23.7MB/s] Downloading: 37%|████████████████████████████████▎ | 3.44G/9.26G [02:47<04:44, 22.0MB/s]2023-12-04 22:29:38,143 - modelscope - WARNING - Download file from: 4026531840 to: 4194303999 failed, will retry Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████▉| 9.26G/9.26G [07:01<00:00, 23.6MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████▉| 7.33G/7.33G [05:32<00:00, 23.7MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 22.7k/22.7k [00:00<00:00, 3.19MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 8.97k/8.97k [00:00<00:00, 25.0MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 30.9MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 544/544 [00:00<00:00, 2.68MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 8.82k/8.82k [00:00<00:00, 22.9MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1.91M/1.91M [00:00<00:00, 9.88MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 954/954 [00:00<00:00, 5.21MB/s] 2023-12-04 22:40:27,279 xinference.core.worker 1349396 ERROR Failed to load model 18d7d912-92b0-11ee-8349-fa163e602ae6-1-0 Traceback (most recent call last): File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 336, in launch_builtin_model await model_ref.load() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 657, in send result = await self._run_coro(message.message_id, coro) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 368, in _run_coro return await coro File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/api.py", line 306, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/core/model.py", line 166, in load self._model.load() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/model/llm/pytorch/core.py", line 182, in load self._model, self._tokenizer = self._load_model(**kwargs) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/model/llm/pytorch/baichuan.py", line 60, in _load_model model = AutoModelForCausalLM.from_pretrained( File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained return model_class.from_pretrained( File "/root/.cache/huggingface/modules/transformers_modules/baichuan-2-chat-pytorch-13b/modeling_baichuan.py", line 670, in from_pretrained return super(BaichuanForCausalLM, cls).from_pretrained(pretrained_model_name_or_path, *model_args, File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3236, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-2-chat-pytorch-13b/modeling_baichuan.py", line 539, in init self.model = BaichuanModel(config) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-2-chat-pytorch-13b/modeling_baichuan.py", line 305, in init self.post_init() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1159, in post_init self._backward_compatibility_gradient_checkpointing() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1163, in _backward_compatibility_gradient_checkpointing self.gradient_checkpointing_enable() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1872, in gradient_checkpointing_enable self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func) TypeError: [address=0.0.0.0:37993, pid=1349498] BaichuanPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable' 2023-12-04 22:40:27,383 xinference.api.restful_api 1349359 ERROR [address=0.0.0.0:37993, pid=1349498] BaichuanPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable' Traceback (most recent call last): File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 417, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 657, in send result = await self._run_coro(message.message_id, coro) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 368, in _run_coro return await coro File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/api.py", line 306, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 476, in launch_builtin_model await _launch_one_model(rep_model_uid) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 445, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 33, in wrapped ret = await func(*args, **kwargs) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 336, in launch_builtin_model await model_ref.load() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 657, in send result = await self._run_coro(message.message_id, coro) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 368, in _run_coro return await coro File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xoscar/api.py", line 306, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/core/model.py", line 166, in load self._model.load() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/model/llm/pytorch/core.py", line 182, in load self._model, self._tokenizer = self._load_model(**kwargs) File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/xinference/model/llm/pytorch/baichuan.py", line 60, in _load_model model = AutoModelForCausalLM.from_pretrained( File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained return model_class.from_pretrained( File "/root/.cache/huggingface/modules/transformers_modules/baichuan-2-chat-pytorch-13b/modeling_baichuan.py", line 670, in from_pretrained return super(BaichuanForCausalLM, cls).from_pretrained(pretrained_model_name_or_path, *model_args, File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3236, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-2-chat-pytorch-13b/modeling_baichuan.py", line 539, in init self.model = BaichuanModel(config) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-2-chat-pytorch-13b/modeling_baichuan.py", line 305, in init self.post_init() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1159, in post_init self._backward_compatibility_gradient_checkpointing() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1163, in _backward_compatibility_gradient_checkpointing self.gradient_checkpointing_enable() File "/export2/miniconda/envs/Xinference/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1872, in gradient_checkpointing_enable self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func) TypeError: [address=0.0.0.0:37993, pid=1349498] BaichuanPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable'
- Minimized code to reproduce the error: just run xinference-local --host 0.0.0.0 --port 9997 and xinference launch --model-name baichuan-2-chat --size-in-billions 13 --model-format pytorch --quantization 8-bit
Expected behavior
just right to load the baichuan-2-chat model
Additional context
I am not familiar with the transformers package inside, but just guess there are some thing api in disorder from old version to the new version
It seems downgrade the transformers pacakge to 4.34.0 works. May be you can add it to the requirements.txt
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.