Query regarding sclite_mode
I don't know if I understand it correctly, but it seems like sclite_mode doesn't really do anything. I tried it with many reference-hypotheses pairs and the results will always be the same whether I set it to boolean True or False.
For reference, here's a small script I tested:
refs = [('a', 'b', 'c'), ('d', 'e', 'f')]
hyps = [('a', 's', 'x', 'c'), ('e', 'f', 'f')]
EPS = '*'
for ref, hyp in zip(refs, hyps):
print(align(ref, hyp, EPS))
print(edit_distance(ref, hyp, sclite_mode=False))
print(edit_distance(ref, hyp, sclite_mode=True))
print(edit_distance(refs, hyps, sclite_mode=False))
print(edit_distance(refs, hyps, sclite_mode=True))
ans = bootstrap_wer_ci(refs, hyps)
print({"wer": ans["wer"], "ci95": ans["ci95"], "ci95min": ans["ci95min"], "ci95max": ans["ci95max"]})
and these are what gets printed:
[('a', 'a'), ('b', 's'), ('*', 'x'), ('c', 'c')]
{'ins': 1, 'del': 0, 'sub': 1, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
{'ins': 1, 'del': 0, 'sub': 1, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
[('d', '*'), ('e', 'e'), ('f', 'f'), ('*', 'f')]
{'ins': 1, 'del': 1, 'sub': 0, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
{'ins': 1, 'del': 1, 'sub': 0, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
For both cases above, the result seems to be giving the same penalty.
{'ins': 0, 'del': 0, 'sub': 2, 'total': 2, 'ref_len': 2, 'err_rate': 1.0}
{'ins': 0, 'del': 0, 'sub': 2, 'total': 2, 'ref_len': 2, 'err_rate': 1.0}
{'wer': 0.6666666666667462, 'ci95': 0.0, 'ci95min': 0.6666666666667462, 'ci95max': 0.6666666666667462}
@desh2608 would you happen to have any test cases back from when you added the feature? Not sure if it’s a regression or something else.
SCLITE weighs ins, del, and sub as 3, 3, and 4 instead of equally. In most cases, however, I think the resulting edit distance would be the same. I had tried constructing some test cases for this but couldn't. I would be curious to see if someone can come up with examples where it would make a difference.