pylangacq
pylangacq copied to clipboard
ValueError: cannot align the utterance and %mor tiers (v2)
similar error as https://github.com/jacksonllee/pylangacq/issues/23 but this issue remains even after 0.19.1
Could we have a mode ** "best-effort" ** that returns an error field in each failing utterance instead of crashing for the whole file, so that we can at least use partial data? (in addition to fixing this bug)
Describe the bug
import pylangacq
file_cha = f"{paths.dir_talkbank_media}/fluency/UMD-CMU/Control/205DM_parent_y1.cha"
reader = pylangacq.read_chat(file_cha) # crash
Relevant CHILDES or TalkBank data fluency/UMD-CMU/Control/205DM_parent_y1.cha
Additional context
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
return [fn(*args) for args in chunk]
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1455, in _parse_chat_str
utterances = self._get_utterances(all_tiers)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1510, in _get_utterances
raise ValueError(
ValueError: cannot align the utterance and %mor tiers:
Tiers --
{'ADU': 'okay I think it;s time for our next game . \x151741207_1746675\x15', '%mor': 'adj|okay pro:sub|I v|think pro:per|it n:let|s n|time prep|for det:poss|our adj|next n|game .', '%gra': '1|3|LINK 2|3|SUBJ 3|0|ROOT 4|3|OBJ 5|6|MOD 6|4|POBJ 7|6|NJCT 8|10|DET 9|10|MOD 10|7|POBJ 11|3|PUNCT'}
Cleaned-up utterance --
okay I think it;s time for our next game .
Parsed %mor tier --
['adj|okay', 'pro:sub|I', 'v|think', 'pro:per|it', 'n:let|s', 'n|time', 'prep|for', 'det:poss|our', 'adj|next', 'n|game', '.']
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/pathto/talkbank_utils.py", line 363, in bug_D20240328T013624
reader = pylangacq.read_chat(file_cha)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 197, in wrapper
return func(*args, **kwargs)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1887, in read_chat
return cls.from_files([path], match=match, exclude=exclude, encoding=encoding)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 197, in wrapper
return func(*args, **kwargs)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1034, in from_files
return cls.from_strs(strs, paths, parallel=parallel)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 197, in wrapper
return func(*args, **kwargs)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 995, in from_strs
reader._parse_chat_strs(strs, ids, parallel)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 264, in _parse_chat_strs
self._files = collections.deque(
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 575, in _chain_from_iterable_of_lists
for element in iterable:
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
ValueError: cannot align the utterance and %mor tiers:
Tiers --
{'ADU': 'okay I think it;s time for our next game . \x151741207_1746675\x15', '%mor': 'adj|okay pro:sub|I v|think pro:per|it n:let|s n|time prep|for det:poss|our adj|next n|game .', '%gra': '1|3|LINK 2|3|SUBJ 3|0|ROOT 4|3|OBJ 5|6|MOD 6|4|POBJ 7|6|NJCT 8|10|DET 9|10|MOD 10|7|POBJ 11|3|PUNCT'}
Cleaned-up utterance --
okay I think it;s time for our next game .
Parsed %mor tier --
['adj|okay', 'pro:sub|I', 'v|think', 'pro:per|it', 'n:let|s', 'n|time', 'prep|for', 'det:poss|our', 'adj|next', 'n|game', '.']