argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[BUG-UI/UX] Emojis cause wrong start:end indexes in SpanQuestion

Open cceyda opened this issue 1 year ago • 6 comments

Describe the bug

If a spanQuestion record contains emoji the start&end indexes are recorded wrong! Example: "Some text yeah 🚀 rocket" -> If I try to tag the word "rocket" I get a warning message in UI image

This happens because the indexes are overflowing (because emoji unicode lengths are calculated differently in javascript and python. This was also an issue in v1 with detailed explanation as to the cause here: https://github.com/argilla-io/argilla/issues/2353) This should be resolved by using either using python style length calculation for the span start:end or translating between UI idx calculations and python calculations.

You wont even know that anything is wrong if you only tag words in the middle and have emojis in the text... it will just be a silent bug recording start:end wrongly off by 1-2 .

To reproduce

No response

Expected behavior

No response

Screenshots

No response

Environment

  • OS [e.g. iOS]: irrelevant
  • Browser [e.g. chrome, safari]: chrome (but is a js code issue so likely effects all browsers)
  • Argilla Version [e.g. 1.0.0]:2.4.1
  • ElasticSearch Version [e.g. 7.10.2]: irrelevant

Additional context

I would say critical. At least until resolved there should be a warning in the docs of SpanQuestion to not use with emojis No response

cceyda avatar Dec 04 '24 05:12 cceyda

@frascuchon and @jfcalvo we need to tackle this this week if we have time! I have the proposal to fix it here: https://github.com/argilla-io/argilla/pull/5001

damianpumar avatar Dec 10 '24 21:12 damianpumar

Thanks @cceyda to report this issue, we really appreciate that.

damianpumar avatar Dec 10 '24 21:12 damianpumar

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] avatar Mar 11 '25 02:03 github-actions[bot]

still an issue

cceyda avatar Mar 13 '25 05:03 cceyda

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] avatar Jun 12 '25 02:06 github-actions[bot]

still an issue & important for non-english languages

cceyda avatar Jun 12 '25 04:06 cceyda

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] avatar Sep 12 '25 02:09 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Oct 12 '25 02:10 github-actions[bot]