Open-Assistant
Open-Assistant copied to clipboard
Assistant should be cautious but still helpful when providing high stakes advice
So the idea of this change is that the assistant should do its best to advise and warn the user of danger; especially danger that the user might be unaware of. These are not meant to be disclaimers in the usual sense, but just as something that can make the assistant more useful to people in dangerous situations. (For sake of credit @whoshuu helped suggest adjustments to my original draft.)
Thank you for adding this. I definitely think it's useful to say "hey this might be dangerous". The question is: should the assistant do the quasi-confirmation-popup that you suggest here, or should it just go on and give you the answer after the warning (in the same message)?
@andreaskoepf do you have opinions on this?
I don't have much of a preference (I didn't think much about that distinction); I was mostly just trying to make sure the examples didn't get too long.
From a UX perspective, it's interesting thinking what the conversational agent equivalent of the "Don't show this message again" checkbox would be.
In this case it's definitely a different type of problem as well. There's a big difference between "Right click > delete file" and "I don't have good insurance but my daughter has a high fever, what do I do?"
In this case it's definitely a different type of problem as well. There's a big difference between "Right click > delete file" and "I don't have good insurance but my daughter has a high fever, what do I do?"
Yeah maybe there's a way to tag dangerous outputs on a scale of "wholesome" to "existential threat" and change the style of the text if it goes over "threat to life / limb"
@andreaskoepf do you have opinions on this?
Including a strong warning when life/health is at risk absolutely makes sense. As long as it remains possible to extract stored knowledge if the user insists I am fine with it. In no case the agent should actively try to convince or encourage a user to harm himself or others or to do "evil things".
I personally would opt for a liberal approach, e.g. we as developers cannot forsee all possible sitations and too restrictive precautions could passively inflict harm because the assistant refuses to give medical advice, for example when someone needs help with a stroke or heart attack. Suggesting to visit the "doctors without borders" website would IMO actually be sarcastic... in existing products we see a spectrum of safety rules implemented, e.g. BMW refuses to display the car's digital manual while the car is moving .. Telsa doesn't do this. When a front-seat passenger is trying to read and BMW turns off the screen this is eye-roll moment every time .. I am personally more with the brave who beliefe in basic intelligence of their users.
TL;DR: This question is hard to answer because it effectively is a political question about individual responsibility.
(Stroke or heart-attack were bad examples... but for other deseases a preliminary diagnosis might be possible from symptoms ... I know doctors hate it when people google their symptoms bur there is a demand for self-help and on-demand practical medical knowledge...)
Seems like this is an example of a deeper philosophical discussion that I can only imagine there will be lots of back and forth on over time.
I think building filters and post processing or safety warnings etc is fine and probably ultimately unavoidable, but also defo a slippery slope depending on how it's done.
I think one main aim though should be that they are always very clear and transparent, able to turn off by users who might want to for whatever reason. So try to stick to "sane defaults" (whatever that means or as best as the community can agree) but also be sure to make sure everything is as open as possible and give enough flexibility that users can make their own informed decisions in terms of what "flavour" of AI they want.
For example if you post a question on reddit you might get some crazy or dangerous misinformation that might even actually look quite convincing - getting same response from a chatbot I don't think should be all that much different in how its treated. Its just some text the users asked the internet for, kinda up to them at the end of the day and the more the population as a whole can actually handle that the better, so we should for sure inform but not necessarily be too paternal perhaps and its just going to fail anyway. But we do probably project more weight onto a response from an AI than we might one random reddit user so maybe it's not quite a fair comparison.
Is a very difficult and tricky area for sure.
Perhaps they could be something from a product or UX point of view where you can see a handful of different responses from the AI so that as a user you have no excuse to blindly just pick one and 100% believe it.
I think the debate comes down to whether this is a safety feature or a safety anti-feature (that big tech is so fond of).
I do personally believe that we don't want any anti-features (combatting anti-features is a long open source tradition), but that warnings can be implemented in such a way that they are a feature. (I think a problem with how I wrote the examples is they might sound a little condescending; maybe there's a way to fix that.)
The question is: should the assistant do the quasi-confirmation-popup that you suggest here, or should it just go on and give you the answer after the warning (in the same message)?
I'm not a UX expert, but here's what I think about these examples if I was a user:
-
Fractal Wood Burning: I did already know fractal wood burning is dangerous, so the warning isn't that useful to me. However, if I didn't, then the warning would be very useful. In particular, I have a tendency to skip over things when I'm reading, so having this confirmation is a great safety feature. I would probably tell it to continue the explanation out of curiosity, but not actually do the project. In addition, the alternative is highly useful; I didn't know that their were super safe and easy ways to generate Lichtenberg figures until I started researching this example.
-
Make a cast: I would likely directly ask about charitable organizations if I needed them, so the warning isn't actually that useful. Just one more step wouldn't annoy me too much though, considering how rare the "make a cast" task should hopefully be! We might want to find a better example though; I'm not familiar with the medical field so I'm not really sure what a good "warning or alternative the user doesn't know about" would be.
Also, here's a third one I didn't include because it would be a bit specific or long, but is very realistic for me. This has a high chance of effecting me as a user:
- Warn about malware or exploits: let's say I give the assistant some code and I want it to add comments, clean it up, change variable names, etc... Let's imagine this has become routine for me. However, the code I give it has an injection exploit, or one of the libraries has malware, or it deals with partitions and it might accidentally delete the home partition, etc... In this example, I could easily see myself skipping the explanation and copying the code, so some sort of warning I have to confirm beforehand could be very useful. I can see myself going "why didn't the dumb computer warn me" even if I did XD.
So in summary in most big cases I think confirmation is useful. (Perhaps we could include examples of lower stake situations without the assistant asking for confirmation? Like maybe if a DIY electric project is slightly riskier than normal it just says "this project presents some unexpected dangers, so I will try to explain additional safety measures as we go".)
I see it kind of like rear-view cameras on a car, although they are designed to prevent the user from doing dangerous things, they are informational and ultimately make the user feel more in control. Knowing that the assistant will try to warn of danger and present alternatives can give me more confidence.
I think a possible issue is mis-generalization were it generates erroneous warnings frequently. For example, if it puts a warning every time I paste code and the warning is wrong. Then this would definitely become an anti-feature for me. Another way it could become an anti-feature if the warnings generalize in a way that hurt capabilities.
I am very impressed that our community is thinking of this issue so deeply. I vote in favor of giving warnings as part of the prompt guidelines. As for LAION's and the open assistant safety policy in general, we are still developing that and this discussion is ongoing :)
@ChristopherKing42 the change does not pass the pre-commit checks (trailing whitespaces) .. could you please commit a fix? thx!
@andreaskoepf okay, I replaced the double space at the end of line with backslashes.
Hmm, how do I fix the prettier error?
Hmm, how do I fix the prettier error?
Normally after pre-commit ran the files should be fixed and just need to be added and committed again. If you installed pre-commit later you can run pre-commit run -a to check all files .. or pre-commit run --files <filename> to check individual files..
@andreaskoepf oh right, it does say that in the contribution guide (I've just been using the web editor up until now 😅). It should be fixed now (I also squashed down all the formatting commits).