fish-speech
fish-speech copied to clipboard
Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false."
Self Checks
- [X] This template is only for bug reports. For questions, please visit Discussions.
- [X] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
- [X] I have searched for existing issues, including closed ones. Search issues
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
macOS = 15.2 (Apple M3), python = 3.10, torch=2.4.1
Steps to Reproduce
- run the command
python tools/api_server.py
✔️ Expected Behavior
The server starts using MPS
❌ Actual Behavior
If use MPS
:
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn(
INFO: Started server process [98040]
INFO: Waiting for application startup.
2024-12-27 15:43:48.722 | INFO | tools.server.model_manager:__init__:41 - mps is available, running on mps.
2024-12-27 15:43:57.305 | INFO | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:43:57.306 | INFO | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:43:57.312 | INFO | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
2024-12-27 15:43:58.504 | INFO | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:43:58.505 | INFO | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:43:58.517 | INFO | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:43:58.519 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
ERROR: Traceback (most recent call last):
File "/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/kui/asgi/lifespan.py", line 36, in __call__
await result
File "/Users/leaf/Demo/fish-speech/tools/api_server.py", line 77, in initialize_app
app.state.model_manager = ModelManager(
File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 66, in __init__
self.warm_up(self.tts_inference_engine)
File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 122, in warm_up
list(inference(request, tts_inference_engine))
File "/Users/leaf/Demo/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
raise HTTPException(
baize.exceptions.HTTPException: (<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, "'Expected elements.dtype() == test_elements.dtype() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)'")
ERROR: Application startup failed. Exiting.
When I manually modify /tools/server/model_manager.py
:
# Check if MPS or CUDA is available
# if torch.backends.mps.is_available():
if False:
self.device = "mps"
logger.info("mps is available, running on mps.")
elif not torch.cuda.is_available():
self.device = "cpu"
logger.info("CUDA is not available, running on CPU.")
and run python tools/api_server.py
again, it works:
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn(
INFO: Started server process [98127]
INFO: Waiting for application startup.
2024-12-27 15:48:35.950 | INFO | tools.server.model_manager:__init__:44 - CUDA is not available, running on CPU.
2024-12-27 15:48:43.325 | INFO | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:48:43.325 | INFO | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:48:43.328 | INFO | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
2024-12-27 15:48:44.036 | INFO | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:48:44.036 | INFO | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:48:44.042 | INFO | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:48:44.042 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
3%|███▍ | 29/1023 [00:02<01:35, 10.37it/s]
2024-12-27 15:48:48.053 | INFO | tools.llama.generate:generate_long:861 - Generated 31 tokens in 4.01 seconds, 7.73 tokens/sec
2024-12-27 15:48:48.054 | INFO | tools.llama.generate:generate_long:864 - Bandwidth achieved: 4.93 GB/s
2024-12-27 15:48:48.066 | INFO | tools.inference_engine.vq_manager:decode_vq_tokens:20 - VQ features: torch.Size([8, 30])
2024-12-27 15:48:48.524 | INFO | tools.server.model_manager:warm_up:123 - Models warmed up.
2024-12-27 15:48:48.524 | INFO | __main__:initialize_app:88 - Startup done, listening server at http://127.0.0.1:8080
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)
And all functions works properly on CPU.