datafaker icon indicating copy to clipboard operation
datafaker copied to clipboard

Create a dictionary of obscene words

Open asolntsev opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe. I suggest to create a dictionary of obscene words. It may be useful for testing spam filters, blog prettifiers etc.

I can provide the list in English, Estonian and Russian.

Describe the solution you'd like It could look like this:

  Faker faker = new Faker(new Locale("en", "US"));
  String blacklistedWord = faker.dictionary().obsent(); // fuck | shit | ass

  Faker faker = new Faker(new Locale("ru", "RU"));
  String blacklistedWord = faker.dictionary().obsent(); // жопа

Additional context It also will be very motivating feature for people to submit their pull request. Imagine they need to add "fuck" on their languages. The easiest and funniest PR ever possible! :)

asolntsev avatar Jun 27 '24 08:06 asolntsev

Did you truncate the method name on purpose? Shouldn't it be obscenity?

Anyway yes I support this idea.

Though I (we) would have zero ability to review other languages.

Also we should keep in mind (perhaps even document) that while some words/phrases are just "bad" some are truly hateful/hurtful and that's hard to nail down (even in languages that you are familiar with).

kingthorin avatar Jun 27 '24 09:06 kingthorin

Just a word of caution: we recently had a production issue with another faker library which was generating some offensive language which ended up in a customer demo, which was a bit of an unfortunate experience.

Also, I'd be hesitant to put words in which are too offensive, or subjective to offense such as racial references, I would prefer to keep this library as positive as possible, there's nothing from stopping people to write their own faker for cases like this.

bodiam avatar Jun 30 '24 01:06 bodiam

Good point, it is also a perfect case for a custom faker with their own yaml or whatever.

kingthorin avatar Jun 30 '24 01:06 kingthorin

how about not only obscene words but also obscene expressions?

however yes, keeping it on a more positive side also makes sense

snuyanzin avatar Jul 13 '24 22:07 snuyanzin

how about not only obscene words but also obscene expressions?

What could possibly go wrong here....

I'm not sure what domains you work on, but in the domains where I work, showing these kind of results could possibly be very damaging to the business. There's nothing from stopping to build your own custom faker if you really need it, but let's keep Datafaker G or PG rated please.

bodiam avatar Jul 13 '24 22:07 bodiam

@snuyanzin @asolntsev if you're realllllllllly keen, you can always use Fallout quotes for the spam filtering:

https://github.com/datafaker-net/datafaker/blob/5a4aa0f8db734ded1b7a8869c6a2502623e69efc/src/main/resources/en/fallout.yml#L109

bodiam avatar Jul 14 '24 07:07 bodiam

@bodiam Sorry, I don't understand how. To test spam filter, I need some provider that stably generate obscene words. Fallout quotes don't suite because only some of them contain obscene words. How can such a test stably work?

asolntsev avatar Jul 14 '24 14:07 asolntsev

Just use a custom provider with your own yaml, then there are no concerns for the project.

kingthorin avatar Jul 14 '24 15:07 kingthorin