inference FEAT: Support Phi-1 & Phi-1.5

Resolve #462

Oct 05 '23 22:10 Bojun-Feng

Hmm, seems like Phi-1.5 can not directly be added as a Pytorch Model and run, some additional glue code might be needed.

Got the following error when trying to run the model on default settings: ModuleNotFoundError: [address=127.0.0.1:56946, pid=42111] No module named 'transformers_modules.phi-1'

Full Log

2023-10-05 17:09:40,791 xinference   42089 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-10-05 17:09:40,792 xinference.core.worker 42089 DEBUG    Worker actor initialized with main pool: 127.0.0.1:21605
2023-10-05 17:09:40,792 xinference.core.supervisor 42089 DEBUG    Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, '127.0.0.1:21605'), kwargs: {}
2023-10-05 17:09:40,792 xinference.core.supervisor 42089 INFO     Worker 127.0.0.1:21605 has been added successfully
2023-10-05 17:09:40,792 xinference.core.supervisor 42089 DEBUG    Leave add_worker, elapsed time: 0 ms
2023-10-05 17:09:40,793 xinference.deploy.worker 42089 INFO     Xinference worker successfully started.
2023-10-05 17:09:41,139 xinference.core.supervisor 42089 DEBUG    Enter list_model_registrations, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM'), kwargs: {}
2023-10-05 17:09:41,139 xinference.core.supervisor 42089 DEBUG    Leave list_model_registrations, elapsed time: 0 ms
2023-10-05 17:09:41,207 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan'), kwargs: {}
2023-10-05 17:09:41,207 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,208 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan-2'), kwargs: {}
2023-10-05 17:09:41,208 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,209 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan-2-chat'), kwargs: {}
2023-10-05 17:09:41,209 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,210 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan-chat'), kwargs: {}
2023-10-05 17:09:41,210 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'chatglm'), kwargs: {}
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'chatglm2'), kwargs: {}
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,212 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'chatglm2-32k'), kwargs: {}
2023-10-05 17:09:41,212 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,213 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'code-llama'), kwargs: {}
2023-10-05 17:09:41,213 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,214 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'code-llama-instruct'), kwargs: {}
2023-10-05 17:09:41,214 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,218 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'code-llama-python'), kwargs: {}
2023-10-05 17:09:41,218 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,219 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'falcon'), kwargs: {}
2023-10-05 17:09:41,219 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'falcon-instruct'), kwargs: {}
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'glaive-coder'), kwargs: {}
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'gpt-2'), kwargs: {}
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-20b'), kwargs: {}
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-7b'), kwargs: {}
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,227 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-chat-20b'), kwargs: {}
2023-10-05 17:09:41,228 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,228 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-chat-7b'), kwargs: {}
2023-10-05 17:09:41,228 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'llama-2'), kwargs: {}
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'llama-2-chat'), kwargs: {}
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'OpenBuddy'), kwargs: {}
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'opt'), kwargs: {}
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'orca'), kwargs: {}
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,233 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'phi-1.5'), kwargs: {}
2023-10-05 17:09:41,233 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'qwen-chat'), kwargs: {}
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'starchat-beta'), kwargs: {}
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'starcoder'), kwargs: {}
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'starcoderplus'), kwargs: {}
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'tiny-llama'), kwargs: {}
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,240 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'vicuna-v1.3'), kwargs: {}
2023-10-05 17:09:41,240 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,243 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'vicuna-v1.5'), kwargs: {}
2023-10-05 17:09:41,243 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,244 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'vicuna-v1.5-16k'), kwargs: {}
2023-10-05 17:09:41,244 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'wizardlm-v1.0'), kwargs: {}
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'wizardmath-v1.0'), kwargs: {}
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:48,019 xinference.core.supervisor 42089 DEBUG    Enter launch_builtin_model, model_uid: e5cf40e0-63cb-11ee-b038-c1055c423403, model_name: phi-1.5, model_size: 1, model_format: pytorch, quantization: none, replica: 1
2023-10-05 17:09:48,019 xinference.core.worker 42089 DEBUG    Enter get_model_count, args: (<xinference.core.worker.WorkerActor object at 0x1597319d0>,), kwargs: {}
2023-10-05 17:09:48,019 xinference.core.worker 42089 DEBUG    Leave get_model_count, elapsed time: 0 ms
2023-10-05 17:09:48,019 xinference.core.worker 42089 DEBUG    Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x1597319d0>,), kwargs: {'model_uid': 'e5cf40e0-63cb-11ee-b038-c1055c423403-1-0', 'model_name': 'phi-1.5', 'model_size_in_billions': 1, 'model_format': 'pytorch', 'quantization': 'none', 'model_type': 'LLM', 'n_gpu': 'auto'}
2023-10-05 17:09:48,019 xinference.core.supervisor 42089 DEBUG    Enter is_local_deployment, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>,), kwargs: {}
2023-10-05 17:09:48,019 xinference.core.supervisor 42089 DEBUG    Leave is_local_deployment, elapsed time: 0 ms
2023-10-05 17:09:48,024 xinference.model.llm.llm_family 42089 INFO     Caching from Hugging Face: microsoft/phi-1_5
2023-10-05 17:09:48,043 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,243 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "GET /api/models/microsoft/phi-1_5/revision/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef HTTP/1.1" 200 2363
Fetching 14 files:   0%|                                                           | 0/14 [00:00<?, ?it/s]2023-10-05 17:09:48,271 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,272 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,273 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,275 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,276 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,278 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,280 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,280 urllib3.connectionpool 42089 DEBUG    Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,399 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/configuration_mixformer_sequential.py HTTP/1.1" 200 0
2023-10-05 17:09:48,399 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/config.json HTTP/1.1" 200 0
2023-10-05 17:09:48,400 filelock     42089 DEBUG    Attempting to acquire lock 5798238032 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,400 filelock     42089 DEBUG    Attempting to acquire lock 5798347600 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,400 filelock     42089 DEBUG    Lock 5798238032 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,400 filelock     42089 DEBUG    Lock 5798347600 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,404 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/merges.txt HTTP/1.1" 200 0
2023-10-05 17:09:48,404 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/generation_config.json HTTP/1.1" 200 0
2023-10-05 17:09:48,404 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/README.md HTTP/1.1" 200 0
2023-10-05 17:09:48,405 filelock     42089 DEBUG    Attempting to acquire lock 5795770704 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,405 filelock     42089 DEBUG    Lock 5795770704 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,409 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/Research%20License.docx HTTP/1.1" 200 0
2023-10-05 17:09:48,409 filelock     42089 DEBUG    Attempting to acquire lock 5788588304 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,409 filelock     42089 DEBUG    Lock 5788588304 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,413 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/.gitattributes HTTP/1.1" 200 0
Fetching 14 files:   7%|███▋                                               | 1/14 [00:00<00:01,  7.05it/s]2023-10-05 17:09:48,444 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/config.json HTTP/1.1" 200 707
Downloading (…)0e7049ef/config.json: 100%|███████████████████████████████| 707/707 [00:00<00:00, 6.15MB/s]
2023-10-05 17:09:48,445 filelock     42089 DEBUG    Attempting to release lock 5798238032 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,445 filelock     42089 DEBUG    Lock 5798238032 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,446 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/configuration_mixformer_sequential.py HTTP/1.1" 200 1860
Downloading (…)former_sequential.py: 100%|███████████████████████████| 1.86k/1.86k [00:00<00:00, 28.2MB/s]
2023-10-05 17:09:48,447 filelock     42089 DEBUG    Attempting to release lock 5798347600 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,447 filelock     42089 DEBUG    Lock 5798347600 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,451 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/modeling_mixformer_sequential.py HTTP/1.1" 200 0
2023-10-05 17:09:48,451 filelock     42089 DEBUG    Attempting to acquire lock 5798359312 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
2023-10-05 17:09:48,451 filelock     42089 DEBUG    Lock 5798359312 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
2023-10-05 17:09:48,452 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/README.md HTTP/1.1" 200 8001
Downloading (…)5a0e7049ef/README.md: 100%|███████████████████████████| 8.00k/8.00k [00:00<00:00, 46.5MB/s]
2023-10-05 17:09:48,454 filelock     42089 DEBUG    Attempting to release lock 5795770704 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,454 filelock     42089 DEBUG    Lock 5795770704 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,454 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/pytorch_model.bin HTTP/1.1" 302 0
2023-10-05 17:09:48,457 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/Research%20License.docx HTTP/1.1" 200 38892
                      2023-10-05 17:09:48,460 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/special_tokens_map.json HTTP/1.1" 200 0
Downloading (…)earch%20License.docx: 100%|███████████████████████████| 38.9k/38.9k [00:00<00:00, 17.6MB/s]
2023-10-05 17:09:48,461 filelock     42089 DEBUG    Attempting to release lock 5788588304 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,461 filelock     42089 DEBUG    Lock 5788588304 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,479 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/added_tokens.json HTTP/1.1" 200 0
2023-10-05 17:09:48,489 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/tokenizer.json HTTP/1.1" 200 0
2023-10-05 17:09:48,491 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/tokenizer_config.json HTTP/1.1" 200 0
2023-10-05 17:09:48,496 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/vocab.json HTTP/1.1" 200 0
2023-10-05 17:09:48,500 urllib3.connectionpool 42089 DEBUG    https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/modeling_mixformer_sequential.py HTTP/1.1" 200 28749
Downloading (…)former_sequential.py: 100%|████████████████████████████| 28.7k/28.7k [00:00<00:00, 142MB/s]
2023-10-05 17:09:48,501 filelock     42089 DEBUG    Attempting to release lock 5798359312 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
2023-10-05 17:09:48,501 filelock     42089 DEBUG    Lock 5798359312 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
Fetching 14 files: 100%|██████████████████████████████████████████████████| 14/14 [00:00<00:00, 61.05it/s]
2023-10-05 17:09:48,502 xinference.model.llm.core 42089 DEBUG    Launching e5cf40e0-63cb-11ee-b038-c1055c423403-1-0 with PytorchModel
2023-10-05 17:09:50,561 xinference.core.supervisor 42089 DEBUG    Enter terminate_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'e5cf40e0-63cb-11ee-b038-c1055c423403'), kwargs: {'suppress_exception': True}
2023-10-05 17:09:50,561 xinference.core.supervisor 42089 DEBUG    Leave terminate_model, elapsed time: 0 ms
2023-10-05 17:09:50,561 xinference.core.restful_api 42089 ERROR    [address=127.0.0.1:56946, pid=42111] No module named 'transformers_modules.phi-1'
Traceback (most recent call last):
  File "/Users/bojunfeng/cs/inference/xinference/core/restful_api.py", line 404, in launch_model
    model_uid = await self._supervisor_ref.launch_builtin_model(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 288, in __pyx_actor_method_wrapper
  File "xoscar/core.pyx", line 422, in _handle_actor_result
  File "xoscar/core.pyx", line 465, in _run_actor_async_generator
  File "xoscar/core.pyx", line 466, in xoscar.core._BaseActor._run_actor_async_generator
  File "xoscar/core.pyx", line 471, in xoscar.core._BaseActor._run_actor_async_generator
  File "/Users/bojunfeng/cs/inference/xinference/core/supervisor.py", line 227, in launch_builtin_model
    yield _launch_one_model(rep_model_uid)
  File "xoscar/core.pyx", line 476, in xoscar.core._BaseActor._run_actor_async_generator
  File "xoscar/core.pyx", line 422, in _handle_actor_result
  File "xoscar/core.pyx", line 465, in _run_actor_async_generator
  File "xoscar/core.pyx", line 466, in xoscar.core._BaseActor._run_actor_async_generator
  File "xoscar/core.pyx", line 471, in xoscar.core._BaseActor._run_actor_async_generator
  File "/Users/bojunfeng/cs/inference/xinference/core/supervisor.py", line 206, in _launch_one_model
    yield worker_ref.launch_builtin_model(
  File "xoscar/core.pyx", line 476, in xoscar.core._BaseActor._run_actor_async_generator
  File "xoscar/core.pyx", line 396, in _handle_actor_result
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
  File "/Users/bojunfeng/cs/inference/xinference/core/utils.py", line 27, in wrapped
    ret = await func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/cs/inference/xinference/core/worker.py", line 187, in launch_builtin_model
    await model_ref.load()
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 657, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
    return await coro
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/api.py", line 306, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
  File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive__
  File "/Users/bojunfeng/cs/inference/xinference/core/model.py", line 117, in load
    self._model.load()
    ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/cs/inference/xinference/model/llm/pytorch/core.py", line 205, in load
    self._model, self._tokenizer = self._load_model(kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/cs/inference/xinference/model/llm/pytorch/core.py", line 124, in _load_model
    model = AutoModelForCausalLM.from_pretrained(
    ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 482, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
    ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1016, in from_pretrained
    config_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 497, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
      ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 199, in get_class_in_module
    module = importlib.import_module(module_path)
      ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/anaconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
      ^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1126, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1126, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: [address=127.0.0.1:56946, pid=42111] No module named 'transformers_modules.phi-1'
2023-10-05 17:09:50,572 urllib3.connectionpool 42089 DEBUG    Starting new HTTP connection (1): 127.0.0.1:9997
2023-10-05 17:09:50,573 xinference.core.supervisor 42089 DEBUG    Enter describe_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'e5cf40e0-63cb-11ee-b038-c1055c423403'), kwargs: {}
2023-10-05 17:09:50,573 xinference.core.restful_api 42089 ERROR    Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403
Traceback (most recent call last):
  File "/Users/bojunfeng/cs/inference/xinference/core/restful_api.py", line 361, in describe_model
    return await self._supervisor_ref.describe_model(model_uid)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
  File "/Users/bojunfeng/cs/inference/xinference/core/utils.py", line 27, in wrapped
    ret = await func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/cs/inference/xinference/core/supervisor.py", line 300, in describe_model
    raise ValueError(f"Model not found in the model list, uid: {model_uid}")
ValueError: Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403
2023-10-05 17:09:50,573 urllib3.connectionpool 42089 DEBUG    http://127.0.0.1:9997 "GET /v1/models/e5cf40e0-63cb-11ee-b038-c1055c423403 HTTP/1.1" 400 89
2023-10-05 17:09:50,573 xinference.core.restful_api 42089 ERROR    Failed to get the model description, detail: Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403
Traceback (most recent call last):
  File "/Users/bojunfeng/cs/inference/xinference/core/restful_api.py", line 453, in build_interface
    gr.mount_gradio_app(self._app, interface.build(), f"/{model_uid}")
                                   ^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/cs/inference/xinference/core/chat_interface.py", line 36, in build
    model = self.client.get_model(self.model_uid)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bojunfeng/cs/inference/xinference/client.py", line 883, in get_model
    raise RuntimeError(
RuntimeError: Failed to get the model description, detail: Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403

Oct 05 '23 22:10 Bojun-Feng

Yeah. I've also tried to run this model with PyTorchModel and instead of the error, I got garbled text. You may want to check the version of transformers and write another generate method in the way like falcon.

Oct 06 '23 02:10 UranusSeven

I found the solution. We were encountering two different issues.

Regarding the ModuleNotFoundError, I named the model Phi-1.5 instead of Phi-1_5, but HuggingFace converts the slashes of directory string into periods at one point (link) and thus causes parsing error.

Regarding the garbled text, a recent update to 4.34.0 (link) in Transformers fixes bugs in tokenizers and addressed the issue. Given the same function name and description, different versions behave drastically different:

Example Generations

Transformer Version 4.32.1:

def print_prime(n):
   """
   Print all primes between 1 and n
   """
 self. While others like that she discovered that he wanted to the other important

Transformer Version 4.34.0:

def print_prime(n):
   """
   Print all primes between 1 and n
   """
   # Initialize an array of all numbers
   all_numbers = [i for i in range(1, n + 1)]

Oct 07 '23 06:10 Bojun-Feng

Actually, after some more testing with prompts from the Phi-1.5 paper it seems that the tokenizer problem is still not fully solved. I see there is still a PR open in the transformers repo working on it. We might have to wait until another update if we are using the same library.

Oct 08 '23 06:10 Bojun-Feng

inference inference copied to clipboard

FEAT: Support Phi-1 & Phi-1.5

inference
inference copied to clipboard