Open-Assistant Assistant create pedophile story describing child abuse

Screenshot from 2023-04-09 11-02-20-censored

(Censored to cause no harm and because even a text is considered child porn in germany, which is illegal.)

Such content can be harmful as it could motivate child abuse and is illegal in some countries. It should not be possible to generate such content.

Apr 09 '23 09:04 davidak

By using OpenAssistant in this way you are in violation of the terms of service, which clearly state in section 3.1:

The user may only use the portal for the intended purposes. In particular, he/she may not
misuse the portal. The user undertakes to refrain from generating text that violate criminal
law, youth protection regulations or the applicable laws of the following countries: Federal
Republic of Germany, United States of America (USA), Great Britain, user's place of
residence. In particular it is prohibited to enter texts that lead to the creation of pornographic,
violence-glorifying or paedosexual content and/or content that violates the personal rights
of third parties. LAION reserves the right to file a criminal complaint with the competent
authorities in the event of violations.

Furthermore, the model you are interacting with is a beta model based on LLAMA and will not be released to the public because the license prohibits such distribution regardless. Pinging @yk, please remove the message tree from the dataset.

Apr 10 '23 14:04 LowYieldFire

he/she may not misuse the portal

that does not really make it more safe when it's that easy. also does not apply to released models

harmful content can also be generated without asking for it, like happened in this case https://github.com/LAION-AI/Open-Assistant/issues/2421

Apr 11 '23 03:04 davidak

Regardless of the terms of service this should not happen! We should probably add more safety datasets, especially anything that helps the model to respond well in chats about abuse.

Apr 11 '23 13:04 CloseChoice

@davidak thanks for noticing. the truth is, any model that's competent is capable of doing these things. and if it's capable, you'll get it out in one way or another. a few times at random, but very often if you try. sometimes you'll have to try a bit harder, which is what e.g. OpenAI safety measures do, but even there, any safety measure is usually immediately circumvented. We're building safety models to filter prompts like this, but it doesn't change the underlying reality. On the other hand, the instances where the model does something like this during "normal" operation, i.e. whenever the user is not out to get something bad out of it, are quite small, which is the much more important measure in my opinion.

In any case, if you're interested in pursuing this direction, I invite you to do so in joining our red-teaming efforts which aim to find weaknesses and provide robust safety mechanisms.

Apr 11 '23 21:04 yk

In any case, if you're interested in pursuing this direction, I invite you to do so in joining our red-teaming efforts

i would love to contribute to that in a more coordinated way. so far i just randomly test anything that comes to my mind. i think ethics and safety are very important for the powerful technology that is currently created

how can i join? right now i can not dedicate a lot of time for that and might not have internet access in the next weeks, but i would like to be able to follow the discussion and contribute when possible

Apr 14 '23 21:04 davidak

how can i join? right now i can not dedicate a lot of time for that and might not have internet access in the next weeks, but i would like to be able to follow the discussion and contribute when possible

ping me on discord

Apr 22 '23 08:04 yk

I think being able to flag or tag responses in the chat UI with various labels could feed into useful workflows and even pretty good further signals and training data around this area. Not sure if we have anything like that in the chat UI at the moment. I think could be a good feature request.

Apr 22 '23 09:04 andrewm4894

Actually I just made this to see if makes sense as a feature request

https://github.com/LAION-AI/Open-Assistant/issues/2832

Apr 22 '23 09:04 andrewm4894

Literally just don't ask it to generate that

Apr 29 '23 07:04 IllusionDX

Open-Assistant Open-Assistant copied to clipboard

Assistant create pedophile story describing child abuse

Open-Assistant
Open-Assistant copied to clipboard