Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Similar languages resulting in spam of language change suggestion popup

Open olliestanley opened this issue 2 years ago • 21 comments

image

olliestanley avatar Feb 06 '23 20:02 olliestanley

Thanks for opening it here for me.

softyoda avatar Feb 06 '23 20:02 softyoda

Hello, I think I can fix this by storing the choice in local storage.

zruq avatar Feb 06 '23 20:02 zruq

Hello, I think I can fix this by storing the choice in local storage.

Sounds good, we don't want it to persist forever though so maybe with some expiry

olliestanley avatar Feb 06 '23 21:02 olliestanley

@olliestanley of course, for how long do you want it to persist?

zruq avatar Feb 06 '23 21:02 zruq

@olliestanley of course, for how long do you want it to persist?

Not sure to be honest. If you have a length in mind go for it and if needed we can always change it later

olliestanley avatar Feb 06 '23 21:02 olliestanley

I take back what I said, I don't think that was a good solution. In most cases we shouldn't let someone reply for example in Spanish to an English prompt. This only poses a problem where the two languages are variant of the same language. so I guess a better solution is to replace "ca" === "fr" with something like is_variant_of("ca","fr")

zruq avatar Feb 06 '23 22:02 zruq

Hmm as in just adding some config for languages we deem similar enough not to show the popup for? That would make sense to me

olliestanley avatar Feb 06 '23 22:02 olliestanley

@olliestanley Yes exactly, we need a function that determines whether two languages are really different. So should I write one for the 14 supported languages, or search for one that cover all cases? The languages are represented in iso6393.

zruq avatar Feb 06 '23 22:02 zruq

@olliestanley Yes exactly, we need a function that determines whether two languages are really different. So should I write one for the 14 supported languages, or search for one that cover all cases? The languages are represented in iso6393.

I would advocate a config file containing a list of pairs which we consider highly similar. This way it will be easy to change without coding anything when new languages are added. Then whenever we detect a language mismatch, if the detected language and the required language form one of the pairs in the config, we don't show the popup.

olliestanley avatar Feb 06 '23 22:02 olliestanley

I got that one a lot too, got a new one about "ro" , romanian, even tho I is very different from french and, looking at the translation of waht I wrote in romanian doesn't look close at all. That is a different setting than the "ca" to "fr" which are basically the same language since french and romanian seems to be very different. My sentence "Il y a un truc bizarre sur le contenant du bocal " and a translation (google) "E ceva ciudat pe recipientul borcanului" . Doesn't look like anything in french at all. Also when the pop up for language changing isn't translated yet.

I've had a similar one with spanish too, which is closer to french but nowhere near enough in that case.

Dryhb avatar Feb 07 '23 02:02 Dryhb

@Dryhb after I checked language code list, ca doesn't stand for Canadian French, but for Catalan. so the problem is with the language detector. for the sentence you provided somehow it thinks it's closer to Romanian than French image

zruq avatar Feb 07 '23 10:02 zruq

I can provide some samples of text of when this popup appear if needed.

softyoda avatar Feb 07 '23 10:02 softyoda

Ho that's why it also sometimes think of Spanish since Catalan is close to it. That issue then comes from the language detector and if you can't fix this(maybe it's not yours), just waiting for 2-3 prompts to be detected with that language before the notification instead of in the middle of a sentence.

Dryhb avatar Feb 07 '23 11:02 Dryhb

@Dryhb yes I'm not qualified to fix this bug, but I can try to improve the user experience. for now it detects the language after 10 words, and rechecks whenever the word count doubles. So I guess to prevent the spam of the popup we could decrease the frequency of rechecks? or remove them entirely, tbh I don't see whey they are needed.

zruq avatar Feb 07 '23 13:02 zruq

That could be useful if you do tasks in two languages and may forget to switch but that is not often and not a lot of people do that I think. Maybe only check when the prompt is submitted and if the check returns that the language isn't right then prevent the final submission by a pop up. Like this it would only check fully written prompts and would be more accurate too.

Dryhb avatar Feb 07 '23 16:02 Dryhb

that's a great solution, I'll work on it.

zruq avatar Feb 07 '23 16:02 zruq

I've gotten this message numerous times while writing a prompt or answer, each time being recommended a different language. The last time I got it, there wasn't even a suggested language. I haven't changed languages once and I always write English text, so I'm not sure what triggers the popup.

brave_JjkNtY1BOI

horribleCodes avatar Feb 07 '23 17:02 horribleCodes

@horribleCodes Can you please share some of those prompts/answers?

zruq avatar Feb 07 '23 17:02 zruq

FYI @mehdi-zibout and all, the popup was removed as part of recent changes (https://github.com/LAION-AI/Open-Assistant/pull/1296). We still have issues identifying the language correctly, but how it is just a small red language tag so it's much easier to ignore.

I think we would still really appreciate anyone working on fixes to make the actual language detection less likely to flag the content if its in a similar language or otherwise incorrectly.

othrayte avatar Feb 08 '23 10:02 othrayte

One thing we should consider is checking the detected probability for the expected language and only flag if both the most likely language is > ~90% and the detection probability for the expected is < ~50% or something. (numbers are just an example, I haven't checked if the values sum to 1.0 or are just 0.0 to 1.0).

othrayte avatar Feb 08 '23 10:02 othrayte

@horribleCodes Can you please share some of those prompts/answers?

Unfortunately, I didn't write any down. I did notice that I still get false positives when writing something with a word more associated with a different language.

For instance, "How do I create a Vornoi pattern in python?" was falsely classified as Italian.

horribleCodes avatar Feb 10 '23 08:02 horribleCodes

I think this was eventually resolved although I'm not entirely sure by which merge

olliestanley avatar Mar 22 '23 16:03 olliestanley