Vilém Zouhar
Vilém Zouhar
cc @gsarti for interest and @mjpost from Twitter discussion.
Hi Anna! Thanks for reporting this issue. I just pushed `v1.1.2` that fixes it. :slightly_smiling_face: Could you update (`pip3 install -U tokenization-scorer`) and also confirm it on your end? Before:...
Actually hold on this introduced another bug.
Should return correct values now with `v1.1.4`: ``` python3 -c "import tokenization_scorer; print(tokenization_scorer.score('Hello there', metric='seq_len'))" -2.0 $ python3 -c "import tokenization_scorer; print(tokenization_scorer.score('Hell @@o the @re', metric='seq_len'))" -4.0 $ python3 -c...
Oh no. Let me look into that.
The reason why this simplest metric is broken so much is because it tries to compute average tokens per line. But what is a line is kinda tough to define...
Resolved with your merge request in v1.1.6. :slightly_smiling_face:
I finally added automatic tests to the package to catch this. The result should be visible as a badge on README.