Persian-NER icon indicating copy to clipboard operation
Persian-NER copied to clipboard

Selecting beginning and end of token instead of labeling each token

Open behdaad opened this issue 5 years ago • 2 comments

Since all tokens are in the form consecutive words, it would be much faster to select multiple words and select the label. All the labels could be inferred this way. Take this example: پیست اسکی نسار بیجار استثنایی‌ترین ... screen shot 1397-09-09 at 12 01 09 You can only select پیست as the starting word, بیجار as the ending word, and select the label مکان and it would be all done. No need to label each token separately.

However, I'm not sure if it's wise to make this the only way to label tokens. I'm not sure if there are examples of this method not working, but I'm almost certain you can find weird examples that cannot be labeled using this method.

This method may or may not be exposed in the API, but I believe it would make labeling by hand in the web interface much easier and faster. (Honestly, since submitting labels reloads the page, labeling tokens is tiresome. Combining this feature with #2 would make manual labeling much faster.)

behdaad avatar Nov 30 '18 08:11 behdaad

Thank you. It's a good suggestion and we will consider it in development backlog but as you mentioned it shouldn't be the only way to label tokens. Currently, we are working on an improved version of the user panel that may resolve some UX issues such as this one

Hameds avatar Nov 30 '18 09:11 Hameds

https://github.com/chakki-works/doccano is an open source annotation tool which has a similar approach to what has been suggested here and in #42. I don't get why do we need to specify the beginning of a token separately.

shayan72 avatar Mar 25 '19 01:03 shayan72