LocalAIVoiceChat
LocalAIVoiceChat copied to clipboard
Erro running on Mac M2
First of all, awsome repo. I've tried all possible instalations combinations, had failed. Any suggests? @KoljaB Machine: Mac M2
Terminal output:
Using model: xtts Initializing STT AudioToTextRecorder ... [2024-06-05 15:39:29.914] [ctranslate2] [thread 1054526] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Select voice (1-5): 1 This is how voice number 1 sounds like /opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn( General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
- (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
- (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
- (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out) occured trying to synthesize text This is how voice number 1 sounds like Traceback: Traceback (most recent call last): File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 279, in _synthesize_worker for i, chunk in enumerate(chunks): File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 643, in inference_stream gpt_generator = self.gpt.get_generator( File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 603, in get_generator return self.gpt_inference.generate_stream( File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py", line 186, in generate model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation( File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/transformers/generation/utils.py", line 473, in _prepare_attention_mask_for_generation torch.isin(elements=inputs, test_elements=pad_token_id).any() TypeError: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
- (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
- (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
- (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)
Error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
- (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
- (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
- (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)
Exception in thread Thread-4 (synthesize_worker): Traceback (most recent call last): File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/text_to_stream.py", line 201, in synthesize_worker self.engine.synthesize(sentence) File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 411, in synthesize status, result = self.parent_synthesize_pipe.recv() File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError Accept voice (y/n):
GPT4o output:
It appears that there are several warnings and errors related to the process of initializing the STT (Speech-to-Text) AudioToTextRecorder and selecting the voice. Here are the issues and their potential resolutions: Compute Type Warning:
[warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Resolution: This is a warning indicating that the model initially designed to use float16 precision has been converted to float32 because the device or backend doesn't support float16 efficiently. This is usually not a critical issue, but if you want to optimize performance, consider using hardware that supports float16 or adjust the model configuration to use float32 from the start.
Pretrained Model Configuration Warning:
UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
Resolution: Update your code to use a generation configuration file as suggested in the warning. This will ensure compatibility with future versions of the library.
General Synthesis Error:
General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of: * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out) * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out) * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out) occured trying to synthesize text This is how voice number 1 sounds like
Resolution: This error indicates a type mismatch in the function call to isin(). Make sure that the arguments passed to isin() are of the correct type as specified in the error message. The elements should either be both Tensors or one should be a Tensor and the other a Number.
To proceed, you may need to: Verify and update the model and its configuration to ensure compatibility with the current hardware and software environment. Make sure that all function calls, particularly those involving Tensors, are using the correct types as expected by the functions.
If you need further assistance or specific code examples to resolve these issues, please provide more details about your setup and the code you're running.