dinglehopper icon indicating copy to clipboard operation
dinglehopper copied to clipboard

Review error rate definitions etc.

Open mikegerber opened this issue 4 years ago • 2 comments

mikegerber avatar Nov 10 '20 11:11 mikegerber

I suggest to implement alignment path length as denominator instead of the GT length (which can be >1):

https://github.com/qurator-spk/dinglehopper/blob/249787686f554ceee4a14c2610772095320d912a/qurator/dinglehopper/character_error_rate.py#L24-L28

(Ideally, you implement all 3 length options: alignment path, maximum sequence, GT sequence.)

The problem for dinglehopper is that your levenshtein_matrix does not give you the alignment path, you only have the resulting minimum distance.

bertsky avatar Jun 09 '21 10:06 bertsky

Update: I recommend using rapidfuzz's normalized_distance instead of just dividing distance by the GT length. Internally (in the CPP backend) the denominator is calculated as the actual path length (=maximum distance).

bertsky avatar Apr 10 '24 16:04 bertsky