Open-Assistant
Open-Assistant copied to clipboard
Similar languages resulting in spam of language change suggestion popup
Thanks for opening it here for me.
Hello, I think I can fix this by storing the choice in local storage.
Hello, I think I can fix this by storing the choice in local storage.
Sounds good, we don't want it to persist forever though so maybe with some expiry
@olliestanley of course, for how long do you want it to persist?
@olliestanley of course, for how long do you want it to persist?
Not sure to be honest. If you have a length in mind go for it and if needed we can always change it later
I take back what I said, I don't think that was a good solution. In most cases we shouldn't let someone reply for example in Spanish to an English prompt.
This only poses a problem where the two languages are variant of the same language.
so I guess a better solution is to replace "ca" === "fr"
with something like is_variant_of("ca","fr")
Hmm as in just adding some config for languages we deem similar enough not to show the popup for? That would make sense to me
@olliestanley Yes exactly, we need a function that determines whether two languages are really different. So should I write one for the 14 supported languages, or search for one that cover all cases? The languages are represented in iso6393.
@olliestanley Yes exactly, we need a function that determines whether two languages are really different. So should I write one for the 14 supported languages, or search for one that cover all cases? The languages are represented in iso6393.
I would advocate a config file containing a list of pairs which we consider highly similar. This way it will be easy to change without coding anything when new languages are added. Then whenever we detect a language mismatch, if the detected language and the required language form one of the pairs in the config, we don't show the popup.
I got that one a lot too, got a new one about "ro" , romanian, even tho I is very different from french and, looking at the translation of waht I wrote in romanian doesn't look close at all. That is a different setting than the "ca" to "fr" which are basically the same language since french and romanian seems to be very different. My sentence "Il y a un truc bizarre sur le contenant du bocal " and a translation (google) "E ceva ciudat pe recipientul borcanului" . Doesn't look like anything in french at all. Also when the pop up for language changing isn't translated yet.
I've had a similar one with spanish too, which is closer to french but nowhere near enough in that case.
@Dryhb after I checked language code list, ca doesn't stand for Canadian French, but for Catalan. so the problem is with the language detector. for the sentence you provided somehow it thinks it's closer to Romanian than French
I can provide some samples of text of when this popup appear if needed.
Ho that's why it also sometimes think of Spanish since Catalan is close to it. That issue then comes from the language detector and if you can't fix this(maybe it's not yours), just waiting for 2-3 prompts to be detected with that language before the notification instead of in the middle of a sentence.
@Dryhb yes I'm not qualified to fix this bug, but I can try to improve the user experience. for now it detects the language after 10 words, and rechecks whenever the word count doubles. So I guess to prevent the spam of the popup we could decrease the frequency of rechecks? or remove them entirely, tbh I don't see whey they are needed.
That could be useful if you do tasks in two languages and may forget to switch but that is not often and not a lot of people do that I think. Maybe only check when the prompt is submitted and if the check returns that the language isn't right then prevent the final submission by a pop up. Like this it would only check fully written prompts and would be more accurate too.
that's a great solution, I'll work on it.
I've gotten this message numerous times while writing a prompt or answer, each time being recommended a different language. The last time I got it, there wasn't even a suggested language. I haven't changed languages once and I always write English text, so I'm not sure what triggers the popup.
@horribleCodes Can you please share some of those prompts/answers?
FYI @mehdi-zibout and all, the popup was removed as part of recent changes (https://github.com/LAION-AI/Open-Assistant/pull/1296). We still have issues identifying the language correctly, but how it is just a small red language tag so it's much easier to ignore.
I think we would still really appreciate anyone working on fixes to make the actual language detection less likely to flag the content if its in a similar language or otherwise incorrectly.
One thing we should consider is checking the detected probability for the expected language and only flag if both the most likely language is > ~90% and the detection probability for the expected is < ~50% or something. (numbers are just an example, I haven't checked if the values sum to 1.0 or are just 0.0 to 1.0).
@horribleCodes Can you please share some of those prompts/answers?
Unfortunately, I didn't write any down. I did notice that I still get false positives when writing something with a word more associated with a different language.
For instance, "How do I create a Vornoi pattern in python?" was falsely classified as Italian.
I think this was eventually resolved although I'm not entirely sure by which merge