Vilém Zouhar

Results 48 comments of Vilém Zouhar

cc @gsarti for interest and @mjpost from Twitter discussion.

Hi Anna! Thanks for reporting this issue. I just pushed `v1.1.2` that fixes it. :slightly_smiling_face: Could you update (`pip3 install -U tokenization-scorer`) and also confirm it on your end? Before:...

Actually hold on this introduced another bug.

Should return correct values now with `v1.1.4`: ``` python3 -c "import tokenization_scorer; print(tokenization_scorer.score('Hello there', metric='seq_len'))" -2.0 $ python3 -c "import tokenization_scorer; print(tokenization_scorer.score('Hell @@o the @re', metric='seq_len'))" -4.0 $ python3 -c...

Oh no. Let me look into that.

The reason why this simplest metric is broken so much is because it tries to compute average tokens per line. But what is a line is kinda tough to define...

Resolved with your merge request in v1.1.6. :slightly_smiling_face:

I finally added automatic tests to the package to catch this. The result should be visible as a badge on README.