I tried to run the script on my macOS(m2) as down follow:
CUDA_VISIBLE_DEVICES='mps:0' accelerate launch bins/tts/inference.py \
--config "ckpts/tts/valle_libritts/args.json"
--log_level debug
--acoustics_dir ckpts/tts/valle_libritts
--output_dir ckpts/tts/valle_libritts/result
--mode "single"
--text "his is a clip of generated speech with the given text from Amphion Vall-E mode"
--text_prompt "many animals of even complex structure which live parasitically within others are wholly devoid of an alimentary cavity"
--audio_prompt ckpts/tts/valle_libritts/prompt/LJ025-0076.wav
--test_list_file None
but i got the error massages :
![Uploading image.png…]()
2023-12-25 21:44:01 | WARNING | phonemizer | words count mismatch on 100.0% of the lines (1/1)
Traceback (most recent call last):
File "/Users/hehongshu/dev/llm/Amphion/bins/tts/inference.py", line 167, in
main()
File "/Users/hehongshu/dev/llm/Amphion/bins/tts/inference.py", line 163, in main
inferencer.inference()
File "/Users/hehongshu/dev/llm/Amphion/models/tts/base/tts_inferece.py", line 173, in inference
pred_audio = self.inference_for_single_utterance()
File "/Users/hehongshu/dev/llm/Amphion/models/tts/valle/valle_inference.py", line 127, in inference_for_single_utterance
audio = self.inference_one_clip(text, text_prompt, audio_file)
File "/Users/hehongshu/dev/llm/Amphion/models/tts/valle/valle_inference.py", line 109, in inference_one_clip
samples = self.audio_tokenizer.decode([(encoded_frames.transpose(2, 1), None)])
File "/Users/hehongshu/dev/llm/Amphion/utils/tokenizer.py", line 68, in decode
return self.codec.decode(frames)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/encodec/model.py", line 175, in decode
return self._decode_frame(encoded_frames[0])
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/encodec/model.py", line 183, in _decode_frame
emb = self.quantizer.decode(codes)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/encodec/quantization/vq.py", line 112, in decode
quantized = self.vq.decode(codes)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/encodec/quantization/core_vq.py", line 361, in decode
quantized = layer.decode(indices)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/encodec/quantization/core_vq.py", line 288, in decode
quantize = self._codebook.decode(embed_ind)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/encodec/quantization/core_vq.py", line 202, in decode
quantize = self.dequantize(embed_ind)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/encodec/quantization/core_vq.py", line 188, in dequantize
quantize = F.embedding(embed_ind, self.embed)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!
Traceback (most recent call last):
File "/Users/hehongshu/anaconda3/envs/amphion/bin/accelerate", line 8, in
sys.exit(main())
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/Users/hehongshu/anaconda3/envs/amphion/lib/python3.9/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/hehongshu/anaconda3/envs/amphion/bin/python', 'bins/tts/inference.py', '--config', 'ckpts/tts/valle_libritts/args.json', '--log_level', 'debug', '--acoustics_dir', 'ckpts/tts/valle_libritts', '--output_dir', 'ckpts/tts/valle_libritts/result', '--mode', 'single', '--text', 'his is a clip of generated speech with the given text from Amphion Vall-E mode', '--text_prompt', 'many animals of even complex structure which live parasitically within others are wholly devoid of an alimentary cavity', '--audio_prompt', 'ckpts/tts/valle_libritts/prompt/LJ025-0076.wav', '--test_list_file', 'None']' returned non-zero exit status 1.
I noticed that you are using macOS with MPS, but you are still using CUDA_VISIBLE_DEVICES
to assign the device. It might be worth considering using PYTORCH_ENABLE_MPS_FALLBACK
instead to assign the MPS device. However, please note that there could be potential compatibility issues since Amphion is primarily developed for the Linux platform with GPU devices.
Hi @MonroeHe , if you have any further questions about training on MPS device, feel free to re-open this issue. We are glad to follow up!