ForwardTacotron
ForwardTacotron copied to clipboard
preprocess.py: list index out of range
Hi @cschaefer26 After the espeak backend acceleration feature was attached, I downloaded the repository with the latest commit to preprocess a small test dataset (I actually tested a dataset of 10000 audios too which I want to train to make a pretrained model for smaller datasets) and I always have this error. I debugged the "cleaned_texts" array and the array is completely empty and doesn't add any elements while preprocessing. Instead, with the commit before the espeak feature is the opposite, it processes an audio along with its text, adds it to that array, and displays all the phonemiced texts. I've tried many datasets, including in a colab notebook with the current changes from the repository and obviously, it's the same error result. I clarify it in case you have something to do with it. Here is a log: 280 .wav files found in "C:\Users\LENOVO_User\ForwardTacotron\testdataset" Using 280 wav files that are indexed in metafile.
+-------------+-----------+--------+------------+-----------+----------------+ | Sample Rate | Bit Depth | Mu Law | Hop Length | CPU Usage | Num Validation | +-------------+-----------+--------+------------+-----------+----------------+ | 22050 | 9 | True | 256 | 4/8 | 200 | +-------------+-----------+--------+------------+-----------+----------------+
input text to phonemize() is str but it must be list of str █░░░░░░░░░░░░░░░ 1/280 texts: []input text to phonemize() is str but it must be list of str █░░░░░░░░░░░░░░░ 2/280 texts: []input text to phonemize() is str but it must be list of str █░░░░░░░░░░░░░░░ 3/280 texts: []input text to phonemize() is str but it must be list of str
████████████████ 280/280 texts: []Traceback (most recent call last):
File "preprocess.py", line 150, in
Hi, this is strange, could you check your version of the phonemizer package? Also you could run the cleaner test to see what the problem might be (probably need to pip install pytest):
python -m pytest tests/test_cleaner.py
Locally I run phonemizer=2.2 without any issues.
@cschaefer26 I'm using the latest phonemizer version (3.2.1) I just did the pytest tests and it's the same error message that appears when each audio is processed. And I discovered something curious that changes from phonemizer 3.0, so I leave you the log so you can check it: ================================================= test session starts ================================================= platform win32 -- Python 3.7.9, pytest-7.1.3, pluggy-1.0.0 rootdir: C:\Users\LENOVO_User\ForwardTacotron plugins: anyio-3.6.1 collected 1 item
tests\test_cleaner.py F [100%]
====================================================== FAILURES ======================================================= __________________________________________ TestCleaner.test_call_happy_path ___________________________________________
self = <tests.test_cleaner.TestCleaner testMethod=test_call_happy_path>
def test_call_happy_path(self) -> None:
cleaner = Cleaner(cleaner_name='no_cleaners',
use_phonemes=True, lang='en-us')
cleaned = cleaner('hello there!')
tests\test_cleaner.py:11:
utils\text\cleaners.py:82: in call text = self.backend.phonemize(text, strip=True)
self = <phonemizer.backend.espeak.espeak.EspeakBackend object at 0x00000188CDC9A488>, text = 'hello there!' separator = None, strip = True, njobs = 1
def phonemize(self, text: List[str],
separator: Optional[Separator] = None,
strip: bool = False,
njobs: int = 1) -> List[str]:
"""Returns the `text` phonemized for the given language
Parameters
----------
text: list of str
The text to be phonemized. Each string in the list
is considered as a separated line. Each line is considered as a text
utterance. Any empty utterance will be ignored.
separator: Separator
string separators between phonemes, syllables
and words, default to separator.default_separator. Syllable separator
is considered only for the festival backend. Word separator is
ignored by the 'espeak-mbrola' backend.
strip: bool
If True, don't output the last word and phone separators
of a token, default to False.
njobs : int
The number of parallel jobs to launch. The input text is
split in ``njobs`` parts, phonemized on parallel instances of the
backend and the outputs are finally collapsed.
Returns
-------
phonemized text: list of str
The input ``text`` phonemized for the given ``language`` and ``backend``.
Raises
------
RuntimeError
if something went wrong during the phonemization
"""
if isinstance(text, str):
# changed in phonemizer-3.0, warn the user
raise RuntimeError(
'input text to phonemize() is str but it must be list of str')
E RuntimeError: input text to phonemize() is str but it must be list of str
..\AppData\Local\Programs\Python\Python37\lib\site-packages\phonemizer\backend\base.py:182: RuntimeError =============================================== short test summary info =============================================== FAILED tests/test_cleaner.py::TestCleaner::test_call_happy_path - RuntimeError: input text to phonemize() is str but ... ================================================== 1 failed in 1.23s ==================================================
Thx for looking into it. I just merged a fix that should solve the problem (seems that phonemizer >=3.0 changed the interface into expecting a List[str] instead of Union[List, st]).
@cschaefer26 Thank you very much, now it works correctly. I'm thinking of making a public tacotron pretrained model to be able to train smaller datasets, so I think to do this is to increase the training schedule from 40k steps to 80k steps, and when training the smallest dataset I could start from 40k due to that the tacotron model was trained up to 40k. Can this work?