Ceyda Cinarel (재이다) comments

Results 47 comments of


Ceyda Cinarel (재이다)

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

https://hsivonen.fi/string-length/ "🤦🏼‍♂️","🤦🏼","💖", "💘", "💝", "💞", "❣️", "✨". I think converting between jspy lengths can solve this. there are too many emojis and strange width chars when working with multiple languages...

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

Here is another thing: https://user-images.githubusercontent.com/15624271/219418599-f0879a98-fa39-4fd7-8499-f9ddd58d54c2.mov I mean I understand why it happens but not how to fix it 🤣 . Just playing around in a notebook you can see why...

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

yes javascript uses UTF-16 encoding to calculate string lengths. While python counts codepoints(or utf-8 encoding bytes) The key concepts to understand are **unicode code points**,**graphemes** and **utf-16** encoding. I meant...

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

Also learned that in JS if you use array expansion(?) you can get the number of codepoints accurately (same as python) ``` [..."🤦🏼‍♂️"].length ```

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

but that is how python counts too! It count's code points. How humans perceive a single _letter_(A,B,C etc)(can think of this as the _grapheme_) and how a single _grapheme_ is...

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

I would heavily suggest not straying from the norm of using `len()` `list()` on the python side (ie counting codepoints), because that is basically how most tokenization libraries work (transformers,spacy......

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

bump as still important

[Bug] Token Classification emojis cause overlapping spans error & wrong annotations

bump

[BUG-UI/UX] `SpanQuestion` selections can get weirdly overwritten

+1 for "CTRL-Z-like return to the previous state"

[WIP] pass kwargs to config

👍 Anyway, wasn't expecting changing something as fundamental as `.from_pretrained` to be reasonable or easy 😅 @sgugger While working on this I realized a couple of things. Will make separate...