pylangacq icon indicating copy to clipboard operation
pylangacq copied to clipboard

ValueError: cannot align the utterance and %mor tiers (v2)

Open timotheecour opened this issue 2 months ago • 2 comments

similar error as https://github.com/jacksonllee/pylangacq/issues/23 but this issue remains even after 0.19.1

Could we have a mode ** "best-effort" ** that returns an error field in each failing utterance instead of crashing for the whole file, so that we can at least use partial data? (in addition to fixing this bug)

Describe the bug

import pylangacq
file_cha = f"{paths.dir_talkbank_media}/fluency/UMD-CMU/Control/205DM_parent_y1.cha"
reader = pylangacq.read_chat(file_cha) # crash

Relevant CHILDES or TalkBank data fluency/UMD-CMU/Control/205DM_parent_y1.cha

Additional context

concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1455, in _parse_chat_str
    utterances = self._get_utterances(all_tiers)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1510, in _get_utterances
    raise ValueError(
ValueError: cannot align the utterance and %mor tiers:
Tiers --
{'ADU': 'okay I think it;s time for our next game . \x151741207_1746675\x15', '%mor': 'adj|okay pro:sub|I v|think pro:per|it n:let|s n|time prep|for det:poss|our adj|next n|game .', '%gra': '1|3|LINK 2|3|SUBJ 3|0|ROOT 4|3|OBJ 5|6|MOD 6|4|POBJ 7|6|NJCT 8|10|DET 9|10|MOD 10|7|POBJ 11|3|PUNCT'}
Cleaned-up utterance --
okay I think it;s time for our next game .
Parsed %mor tier --
['adj|okay', 'pro:sub|I', 'v|think', 'pro:per|it', 'n:let|s', 'n|time', 'prep|for', 'det:poss|our', 'adj|next', 'n|game', '.']
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/pathto/talkbank_utils.py", line 363, in bug_D20240328T013624
    reader = pylangacq.read_chat(file_cha)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 197, in wrapper
    return func(*args, **kwargs)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1887, in read_chat
    return cls.from_files([path], match=match, exclude=exclude, encoding=encoding)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 197, in wrapper
    return func(*args, **kwargs)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 1034, in from_files
    return cls.from_strs(strs, paths, parallel=parallel)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 197, in wrapper
    return func(*args, **kwargs)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 995, in from_strs
    reader._parse_chat_strs(strs, ids, parallel)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/site-packages/pylangacq/chat.py", line 264, in _parse_chat_strs
    self._files = collections.deque(
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/process.py", line 575, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/timothee/.conda/envs/speakerid_cuda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
ValueError: cannot align the utterance and %mor tiers:
Tiers --
{'ADU': 'okay I think it;s time for our next game . \x151741207_1746675\x15', '%mor': 'adj|okay pro:sub|I v|think pro:per|it n:let|s n|time prep|for det:poss|our adj|next n|game .', '%gra': '1|3|LINK 2|3|SUBJ 3|0|ROOT 4|3|OBJ 5|6|MOD 6|4|POBJ 7|6|NJCT 8|10|DET 9|10|MOD 10|7|POBJ 11|3|PUNCT'}
Cleaned-up utterance --
okay I think it;s time for our next game .
Parsed %mor tier --
['adj|okay', 'pro:sub|I', 'v|think', 'pro:per|it', 'n:let|s', 'n|time', 'prep|for', 'det:poss|our', 'adj|next', 'n|game', '.']

timotheecour avatar Apr 09 '24 00:04 timotheecour