fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false."

Open LeafYeeXYZ opened this issue 2 months ago • 4 comments

Self Checks

  • [X] This template is only for bug reports. For questions, please visit Discussions.
  • [X] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
  • [X] I have searched for existing issues, including closed ones. Search issues
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Self Hosted (Source)

Environment Details

macOS = 15.2 (Apple M3), python = 3.10, torch=2.4.1

Steps to Reproduce

  1. run the command python tools/api_server.py

✔️ Expected Behavior

The server starts using MPS

❌ Actual Behavior

If use MPS:

/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn(
INFO:     Started server process [98040]
INFO:     Waiting for application startup.
2024-12-27 15:43:48.722 | INFO     | tools.server.model_manager:__init__:41 - mps is available, running on mps.
2024-12-27 15:43:57.305 | INFO     | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:43:57.306 | INFO     | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:43:57.312 | INFO     | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2024-12-27 15:43:58.504 | INFO     | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:43:58.505 | INFO     | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:43:58.517 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:43:58.519 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
ERROR:    Traceback (most recent call last):
  File "/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/kui/asgi/lifespan.py", line 36, in __call__
    await result
  File "/Users/leaf/Demo/fish-speech/tools/api_server.py", line 77, in initialize_app
    app.state.model_manager = ModelManager(
  File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 66, in __init__
    self.warm_up(self.tts_inference_engine)
  File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 122, in warm_up
    list(inference(request, tts_inference_engine))
  File "/Users/leaf/Demo/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
    raise HTTPException(
baize.exceptions.HTTPException: (<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, "'Expected elements.dtype() == test_elements.dtype() to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)'")

ERROR:    Application startup failed. Exiting.

When I manually modify /tools/server/model_manager.py:

        # Check if MPS or CUDA is available
        # if torch.backends.mps.is_available():
        if False:
            self.device = "mps"
            logger.info("mps is available, running on mps.")
        elif not torch.cuda.is_available():
            self.device = "cpu"
            logger.info("CUDA is not available, running on CPU.")

and run python tools/api_server.py again, it works:

/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn(
INFO:     Started server process [98127]
INFO:     Waiting for application startup.
2024-12-27 15:48:35.950 | INFO     | tools.server.model_manager:__init__:44 - CUDA is not available, running on CPU.
2024-12-27 15:48:43.325 | INFO     | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:48:43.325 | INFO     | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:48:43.328 | INFO     | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2024-12-27 15:48:44.036 | INFO     | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:48:44.036 | INFO     | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:48:44.042 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:48:44.042 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
  3%|███▍                                                                                                                    | 29/1023 [00:02<01:35, 10.37it/s]
2024-12-27 15:48:48.053 | INFO     | tools.llama.generate:generate_long:861 - Generated 31 tokens in 4.01 seconds, 7.73 tokens/sec
2024-12-27 15:48:48.054 | INFO     | tools.llama.generate:generate_long:864 - Bandwidth achieved: 4.93 GB/s
2024-12-27 15:48:48.066 | INFO     | tools.inference_engine.vq_manager:decode_vq_tokens:20 - VQ features: torch.Size([8, 30])
2024-12-27 15:48:48.524 | INFO     | tools.server.model_manager:warm_up:123 - Models warmed up.
2024-12-27 15:48:48.524 | INFO     | __main__:initialize_app:88 - Startup done, listening server at http://127.0.0.1:8080
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)

And all functions works properly on CPU.

LeafYeeXYZ avatar Dec 27 '24 07:12 LeafYeeXYZ