ipex-llm
ipex-llm copied to clipboard
bark model on intel gpu takes 60 seconds
hello i am attempting to create text to speech with bark on intel a770 but it takes around 60 seconds to generate audio is that normal ? is there a way to make it faster like few seconds ? https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/bark
(phytia2) C:\phytia\Phytia>python ./synthesize_speech.py --text "IPEX-LLM is a library for running large language model on Intel XPU with very low latency." C:\Users\SlyRebula\miniconda3\envs\phytia2\Lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\SlyRebula\miniconda3\envs\Phytia2\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from
torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have
libjpegor
libpnginstalled before building
torchvisionfrom source? warn( 2024-07-31 13:47:18,476 - INFO - intel_extension_for_pytorch auto imported C:\Users\SlyRebula\miniconda3\envs\phytia2\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning:
resume_downloadis deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use
force_download=True. warnings.warn( C:\Users\SlyRebula\miniconda3\envs\phytia2\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-07-31 13:47:22,731 - INFO - Converting the current model to sym_int4 format...... The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's
attention_maskto obtain reliable results. Setting
pad_token_idto
eos_token_id:10000 for open-end generation. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's
attention_maskto obtain reliable results. Setting
pad_token_idto
eos_token_id:10000 for open-end generation. Inference time: 54.660537242889404 s