inference
inference copied to clipboard
FEAT: Support Phi-1 & Phi-1.5
Resolve #462
Hmm, seems like Phi-1.5 can not directly be added as a Pytorch Model and run, some additional glue code might be needed.
Got the following error when trying to run the model on default settings:
ModuleNotFoundError: [address=127.0.0.1:56946, pid=42111] No module named 'transformers_modules.phi-1'
Full Log
2023-10-05 17:09:40,791 xinference 42089 INFO Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-10-05 17:09:40,792 xinference.core.worker 42089 DEBUG Worker actor initialized with main pool: 127.0.0.1:21605
2023-10-05 17:09:40,792 xinference.core.supervisor 42089 DEBUG Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, '127.0.0.1:21605'), kwargs: {}
2023-10-05 17:09:40,792 xinference.core.supervisor 42089 INFO Worker 127.0.0.1:21605 has been added successfully
2023-10-05 17:09:40,792 xinference.core.supervisor 42089 DEBUG Leave add_worker, elapsed time: 0 ms
2023-10-05 17:09:40,793 xinference.deploy.worker 42089 INFO Xinference worker successfully started.
2023-10-05 17:09:41,139 xinference.core.supervisor 42089 DEBUG Enter list_model_registrations, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM'), kwargs: {}
2023-10-05 17:09:41,139 xinference.core.supervisor 42089 DEBUG Leave list_model_registrations, elapsed time: 0 ms
2023-10-05 17:09:41,207 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan'), kwargs: {}
2023-10-05 17:09:41,207 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,208 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan-2'), kwargs: {}
2023-10-05 17:09:41,208 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,209 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan-2-chat'), kwargs: {}
2023-10-05 17:09:41,209 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,210 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'baichuan-chat'), kwargs: {}
2023-10-05 17:09:41,210 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'chatglm'), kwargs: {}
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'chatglm2'), kwargs: {}
2023-10-05 17:09:41,211 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,212 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'chatglm2-32k'), kwargs: {}
2023-10-05 17:09:41,212 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,213 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'code-llama'), kwargs: {}
2023-10-05 17:09:41,213 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,214 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'code-llama-instruct'), kwargs: {}
2023-10-05 17:09:41,214 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,218 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'code-llama-python'), kwargs: {}
2023-10-05 17:09:41,218 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,219 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'falcon'), kwargs: {}
2023-10-05 17:09:41,219 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'falcon-instruct'), kwargs: {}
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'glaive-coder'), kwargs: {}
2023-10-05 17:09:41,220 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'gpt-2'), kwargs: {}
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-20b'), kwargs: {}
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-7b'), kwargs: {}
2023-10-05 17:09:41,221 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,227 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-chat-20b'), kwargs: {}
2023-10-05 17:09:41,228 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,228 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'internlm-chat-7b'), kwargs: {}
2023-10-05 17:09:41,228 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'llama-2'), kwargs: {}
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'llama-2-chat'), kwargs: {}
2023-10-05 17:09:41,229 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'OpenBuddy'), kwargs: {}
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'opt'), kwargs: {}
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'orca'), kwargs: {}
2023-10-05 17:09:41,230 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,233 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'phi-1.5'), kwargs: {}
2023-10-05 17:09:41,233 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'qwen-chat'), kwargs: {}
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'starchat-beta'), kwargs: {}
2023-10-05 17:09:41,234 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'starcoder'), kwargs: {}
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'starcoderplus'), kwargs: {}
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'tiny-llama'), kwargs: {}
2023-10-05 17:09:41,235 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,240 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'vicuna-v1.3'), kwargs: {}
2023-10-05 17:09:41,240 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,243 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'vicuna-v1.5'), kwargs: {}
2023-10-05 17:09:41,243 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,244 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'vicuna-v1.5-16k'), kwargs: {}
2023-10-05 17:09:41,244 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'wizardlm-v1.0'), kwargs: {}
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'LLM', 'wizardmath-v1.0'), kwargs: {}
2023-10-05 17:09:41,245 xinference.core.supervisor 42089 DEBUG Leave get_model_registration, elapsed time: 0 ms
2023-10-05 17:09:48,019 xinference.core.supervisor 42089 DEBUG Enter launch_builtin_model, model_uid: e5cf40e0-63cb-11ee-b038-c1055c423403, model_name: phi-1.5, model_size: 1, model_format: pytorch, quantization: none, replica: 1
2023-10-05 17:09:48,019 xinference.core.worker 42089 DEBUG Enter get_model_count, args: (<xinference.core.worker.WorkerActor object at 0x1597319d0>,), kwargs: {}
2023-10-05 17:09:48,019 xinference.core.worker 42089 DEBUG Leave get_model_count, elapsed time: 0 ms
2023-10-05 17:09:48,019 xinference.core.worker 42089 DEBUG Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x1597319d0>,), kwargs: {'model_uid': 'e5cf40e0-63cb-11ee-b038-c1055c423403-1-0', 'model_name': 'phi-1.5', 'model_size_in_billions': 1, 'model_format': 'pytorch', 'quantization': 'none', 'model_type': 'LLM', 'n_gpu': 'auto'}
2023-10-05 17:09:48,019 xinference.core.supervisor 42089 DEBUG Enter is_local_deployment, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>,), kwargs: {}
2023-10-05 17:09:48,019 xinference.core.supervisor 42089 DEBUG Leave is_local_deployment, elapsed time: 0 ms
2023-10-05 17:09:48,024 xinference.model.llm.llm_family 42089 INFO Caching from Hugging Face: microsoft/phi-1_5
2023-10-05 17:09:48,043 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,243 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "GET /api/models/microsoft/phi-1_5/revision/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef HTTP/1.1" 200 2363
Fetching 14 files: 0%| | 0/14 [00:00<?, ?it/s]2023-10-05 17:09:48,271 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,272 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,273 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,275 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,276 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,278 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,280 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,280 urllib3.connectionpool 42089 DEBUG Starting new HTTPS connection (1): huggingface.co:443
2023-10-05 17:09:48,399 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/configuration_mixformer_sequential.py HTTP/1.1" 200 0
2023-10-05 17:09:48,399 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/config.json HTTP/1.1" 200 0
2023-10-05 17:09:48,400 filelock 42089 DEBUG Attempting to acquire lock 5798238032 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,400 filelock 42089 DEBUG Attempting to acquire lock 5798347600 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,400 filelock 42089 DEBUG Lock 5798238032 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,400 filelock 42089 DEBUG Lock 5798347600 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,404 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/merges.txt HTTP/1.1" 200 0
2023-10-05 17:09:48,404 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/generation_config.json HTTP/1.1" 200 0
2023-10-05 17:09:48,404 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/README.md HTTP/1.1" 200 0
2023-10-05 17:09:48,405 filelock 42089 DEBUG Attempting to acquire lock 5795770704 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,405 filelock 42089 DEBUG Lock 5795770704 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,409 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/Research%20License.docx HTTP/1.1" 200 0
2023-10-05 17:09:48,409 filelock 42089 DEBUG Attempting to acquire lock 5788588304 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,409 filelock 42089 DEBUG Lock 5788588304 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,413 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/.gitattributes HTTP/1.1" 200 0
Fetching 14 files: 7%|███▋ | 1/14 [00:00<00:01, 7.05it/s]2023-10-05 17:09:48,444 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/config.json HTTP/1.1" 200 707
Downloading (…)0e7049ef/config.json: 100%|███████████████████████████████| 707/707 [00:00<00:00, 6.15MB/s]
2023-10-05 17:09:48,445 filelock 42089 DEBUG Attempting to release lock 5798238032 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,445 filelock 42089 DEBUG Lock 5798238032 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/c2b5ff89977b9726d5c3e54c28e17aa36d83f268.lock
2023-10-05 17:09:48,446 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/configuration_mixformer_sequential.py HTTP/1.1" 200 1860
Downloading (…)former_sequential.py: 100%|███████████████████████████| 1.86k/1.86k [00:00<00:00, 28.2MB/s]
2023-10-05 17:09:48,447 filelock 42089 DEBUG Attempting to release lock 5798347600 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,447 filelock 42089 DEBUG Lock 5798347600 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/8cc2d51cba96dbebf98898e731cca1d9c5977f71.lock
2023-10-05 17:09:48,451 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/modeling_mixformer_sequential.py HTTP/1.1" 200 0
2023-10-05 17:09:48,451 filelock 42089 DEBUG Attempting to acquire lock 5798359312 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
2023-10-05 17:09:48,451 filelock 42089 DEBUG Lock 5798359312 acquired on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
2023-10-05 17:09:48,452 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/README.md HTTP/1.1" 200 8001
Downloading (…)5a0e7049ef/README.md: 100%|███████████████████████████| 8.00k/8.00k [00:00<00:00, 46.5MB/s]
2023-10-05 17:09:48,454 filelock 42089 DEBUG Attempting to release lock 5795770704 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,454 filelock 42089 DEBUG Lock 5795770704 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/6f26581545cae8f8f375c5f0f90d956c194a20fd.lock
2023-10-05 17:09:48,454 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/pytorch_model.bin HTTP/1.1" 302 0
2023-10-05 17:09:48,457 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/Research%20License.docx HTTP/1.1" 200 38892
2023-10-05 17:09:48,460 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/special_tokens_map.json HTTP/1.1" 200 0
Downloading (…)earch%20License.docx: 100%|███████████████████████████| 38.9k/38.9k [00:00<00:00, 17.6MB/s]
2023-10-05 17:09:48,461 filelock 42089 DEBUG Attempting to release lock 5788588304 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,461 filelock 42089 DEBUG Lock 5788588304 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/390505dd1ab349a07cf9764b9dc733d28ea28385.lock
2023-10-05 17:09:48,479 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/added_tokens.json HTTP/1.1" 200 0
2023-10-05 17:09:48,489 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/tokenizer.json HTTP/1.1" 200 0
2023-10-05 17:09:48,491 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/tokenizer_config.json HTTP/1.1" 200 0
2023-10-05 17:09:48,496 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "HEAD /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/vocab.json HTTP/1.1" 200 0
2023-10-05 17:09:48,500 urllib3.connectionpool 42089 DEBUG https://huggingface.co:443 "GET /microsoft/phi-1_5/resolve/b6a7e2fe15c21f5847279f23e280cc5a0e7049ef/modeling_mixformer_sequential.py HTTP/1.1" 200 28749
Downloading (…)former_sequential.py: 100%|████████████████████████████| 28.7k/28.7k [00:00<00:00, 142MB/s]
2023-10-05 17:09:48,501 filelock 42089 DEBUG Attempting to release lock 5798359312 on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
2023-10-05 17:09:48,501 filelock 42089 DEBUG Lock 5798359312 released on /Users/bojunfeng/.cache/huggingface/hub/models--microsoft--phi-1_5/blobs/7d4f7229ad6e5f85e7ff4fba20847d4052bb74d2.lock
Fetching 14 files: 100%|██████████████████████████████████████████████████| 14/14 [00:00<00:00, 61.05it/s]
2023-10-05 17:09:48,502 xinference.model.llm.core 42089 DEBUG Launching e5cf40e0-63cb-11ee-b038-c1055c423403-1-0 with PytorchModel
2023-10-05 17:09:50,561 xinference.core.supervisor 42089 DEBUG Enter terminate_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'e5cf40e0-63cb-11ee-b038-c1055c423403'), kwargs: {'suppress_exception': True}
2023-10-05 17:09:50,561 xinference.core.supervisor 42089 DEBUG Leave terminate_model, elapsed time: 0 ms
2023-10-05 17:09:50,561 xinference.core.restful_api 42089 ERROR [address=127.0.0.1:56946, pid=42111] No module named 'transformers_modules.phi-1'
Traceback (most recent call last):
File "/Users/bojunfeng/cs/inference/xinference/core/restful_api.py", line 404, in launch_model
model_uid = await self._supervisor_ref.launch_builtin_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 288, in __pyx_actor_method_wrapper
File "xoscar/core.pyx", line 422, in _handle_actor_result
File "xoscar/core.pyx", line 465, in _run_actor_async_generator
File "xoscar/core.pyx", line 466, in xoscar.core._BaseActor._run_actor_async_generator
File "xoscar/core.pyx", line 471, in xoscar.core._BaseActor._run_actor_async_generator
File "/Users/bojunfeng/cs/inference/xinference/core/supervisor.py", line 227, in launch_builtin_model
yield _launch_one_model(rep_model_uid)
File "xoscar/core.pyx", line 476, in xoscar.core._BaseActor._run_actor_async_generator
File "xoscar/core.pyx", line 422, in _handle_actor_result
File "xoscar/core.pyx", line 465, in _run_actor_async_generator
File "xoscar/core.pyx", line 466, in xoscar.core._BaseActor._run_actor_async_generator
File "xoscar/core.pyx", line 471, in xoscar.core._BaseActor._run_actor_async_generator
File "/Users/bojunfeng/cs/inference/xinference/core/supervisor.py", line 206, in _launch_one_model
yield worker_ref.launch_builtin_model(
File "xoscar/core.pyx", line 476, in xoscar.core._BaseActor._run_actor_async_generator
File "xoscar/core.pyx", line 396, in _handle_actor_result
File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
File "/Users/bojunfeng/cs/inference/xinference/core/utils.py", line 27, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/cs/inference/xinference/core/worker.py", line 187, in launch_builtin_model
await model_ref.load()
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 657, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
return await coro
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/xoscar/api.py", line 306, in __on_receive__
return await super().__on_receive__(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 558, in __on_receive__
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive__
File "/Users/bojunfeng/cs/inference/xinference/core/model.py", line 117, in load
self._model.load()
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/cs/inference/xinference/model/llm/pytorch/core.py", line 205, in load
self._model, self._tokenizer = self._load_model(kwargs)
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/cs/inference/xinference/model/llm/pytorch/core.py", line 124, in _load_model
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 482, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1016, in from_pretrained
config_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 497, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/anaconda3/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 199, in get_class_in_module
module = importlib.import_module(module_path)
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/anaconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1126, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1126, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: [address=127.0.0.1:56946, pid=42111] No module named 'transformers_modules.phi-1'
2023-10-05 17:09:50,572 urllib3.connectionpool 42089 DEBUG Starting new HTTP connection (1): 127.0.0.1:9997
2023-10-05 17:09:50,573 xinference.core.supervisor 42089 DEBUG Enter describe_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x1596a79b0>, 'e5cf40e0-63cb-11ee-b038-c1055c423403'), kwargs: {}
2023-10-05 17:09:50,573 xinference.core.restful_api 42089 ERROR Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403
Traceback (most recent call last):
File "/Users/bojunfeng/cs/inference/xinference/core/restful_api.py", line 361, in describe_model
return await self._supervisor_ref.describe_model(model_uid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
File "/Users/bojunfeng/cs/inference/xinference/core/utils.py", line 27, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/cs/inference/xinference/core/supervisor.py", line 300, in describe_model
raise ValueError(f"Model not found in the model list, uid: {model_uid}")
ValueError: Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403
2023-10-05 17:09:50,573 urllib3.connectionpool 42089 DEBUG http://127.0.0.1:9997 "GET /v1/models/e5cf40e0-63cb-11ee-b038-c1055c423403 HTTP/1.1" 400 89
2023-10-05 17:09:50,573 xinference.core.restful_api 42089 ERROR Failed to get the model description, detail: Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403
Traceback (most recent call last):
File "/Users/bojunfeng/cs/inference/xinference/core/restful_api.py", line 453, in build_interface
gr.mount_gradio_app(self._app, interface.build(), f"/{model_uid}")
^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/cs/inference/xinference/core/chat_interface.py", line 36, in build
model = self.client.get_model(self.model_uid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bojunfeng/cs/inference/xinference/client.py", line 883, in get_model
raise RuntimeError(
RuntimeError: Failed to get the model description, detail: Model not found in the model list, uid: e5cf40e0-63cb-11ee-b038-c1055c423403
Yeah. I've also tried to run this model with PyTorchModel
and instead of the error, I got garbled text. You may want to check the version of transformers
and write another generate method in the way like falcon
.
I found the solution. We were encountering two different issues.
Regarding the ModuleNotFoundError
, I named the model Phi-1.5
instead of Phi-1_5
, but HuggingFace converts the slashes of directory string into periods at one point (link) and thus causes parsing error.
Regarding the garbled text, a recent update to 4.34.0 (link) in Transformers fixes bugs in tokenizers and addressed the issue. Given the same function name and description, different versions behave drastically different:
Example Generations
Transformer Version 4.32.1
:
def print_prime(n):
"""
Print all primes between 1 and n
"""
self. While others like that she discovered that he wanted to the other important
Transformer Version 4.34.0
:
def print_prime(n):
"""
Print all primes between 1 and n
"""
# Initialize an array of all numbers
all_numbers = [i for i in range(1, n + 1)]
Actually, after some more testing with prompts from the Phi-1.5 paper it seems that the tokenizer problem is still not fully solved. I see there is still a PR open in the transformers repo working on it. We might have to wait until another update if we are using the same library.