icefall icon indicating copy to clipboard operation
icefall copied to clipboard

<unk> in parse_hyp_and_timestamp

Open RuslanSel opened this issue 1 year ago • 1 comments

In parse_hyp_and_timestamp() after tokens = sp.id_to_piece(res.hyps[i]) I have such tokens: ▁', '', '-', 'mo', 'du', 'le', then words = sp.decode_pieces(tokens).split() gives me two words instead of one '⁇', '-module' but time = parse_timestamp(tokens, time) treats these tokens like one word, and then assert len(time) == len(words), (len(time), len(words)) AssertionError: (1, 2)

Thanks in advance.

RuslanSel avatar Sep 27 '23 21:09 RuslanSel

@yaozengwei Could you have a look?

csukuangfj avatar Sep 27 '23 21:09 csukuangfj