faker Faker is unsafe to use in automated tests - real urls and email addresses are generated

Describe the bug

Faker::Internet.url, and Faker::Internet.email generate potentially real domain names and email addresses. Using potentially real urls and email addresses in automated testing environments is a security vulnerability. Faker is extensively used in automated test environments to generate test data.

To Reproduce

Call Faker::Internet.url and see that the result is a url that could resolve to a real domain name, with a tld of .com.

Expected behavior

A TLD of .test or .example should be used instead. See RFC2606

Additional context

I understand that some users would like to see .com urls in their Faker output. I'd like to suggest that this should be easy to do by opting in to this behaviour instead of it being the default. Perhaps a parameter for specifying the tld when generating an email address or url? The default behaviour however should be safe.

Dec 20 '21 22:12 patrick-beep

I can tackle this issue.

Feb 05 '22 03:02 KarlHeitmann

@patrick-beep please check https://github.com/faker-ruby/faker/pull/2506

Faker::Internet.email # => "[email protected]"

Faker::Config.internet_safe_mode = true
Faker::Internet.email # => "[email protected]"
Faker::Internet.url # => "http://marks.example/vern"
Faker::Internet.domain_name # => "brown.example"

Jul 03 '22 14:07 saiqulhaq

We currently have Faker::Internet.safe_email:

https://github.com/faker-ruby/faker/blob/2333f9ee4a4e1d8329e19f9e4b07e927648add7c/doc/default/internet.md?plain=1#L20

I assume going with this approach of configuring internet_safe_mode could include adding a deprecation to Faker::Internet.safe_email. Otherwise, I think having both would lead to confusion.

Thoughts?

Sep 23 '22 04:09 stefannibrasil

I concur that #safe_email could then be deprecated after internet_safe mode is enabled

Sep 27 '22 12:09 Zeragamba

I am wondering why we need to add config - if we already have a safe_email method? And can we not also add a safe_url method.

From my perspective it seems like a smaller change, also more inline with the existing convention perhaps.

Sep 28 '22 13:09 fbuys

@fbuys to be better internet citizens, we want to update Faker to create safe emails by default. Especially since Faker is often used in automated test environments that may accidently spam a real person or system.

Sep 28 '22 14:09 Zeragamba

Oh I got it @Zeragamba, what if we did it the other way around though. Instead we remove safe_email and return safe email addresses with email and then real_world_email would generate "unsafe" email addresses.

Sep 29 '22 15:09 fbuys

I'd still prefer to leave it as #email so that we can keep the api surface clean. plus we can also allow the user to override the safe mode if needed by allowing them to specify a domain to use instead of example or perhaps allow internet_safe: false as needed

Sep 29 '22 20:09 Zeragamba

Oh yes, I like the idea of passing an override in. So I'd vote for: internet_safe: false to turn safe emails off.

Sep 30 '22 16:09 fbuys

We would like this but/and...

We use a "dud domain" checker that would break on these emails. ie, our app prevents users from using a [email protected] or even a [email protected] address.

We do have an internal domain we do allow that is "real", but won't "spam anyone".

Can this feature have a "set what qualifies as a safe domain" config?

eg:

internet_safe: false
internet_safe_domains: ["myinternaldomain.com"]

Oct 07 '22 17:10 hlascelles

These are the changes that will happen when my PR is merged (https://docs.google.com/spreadsheets/d/1Sec5OcigkhJnlyYd8GoE8kCbs4OyoFOR1oI9fP9c6zY/edit?usp=sharing)

it introduces a few configs too

internet_safe_mode= true / false
internet_safe_mode?
internet_safe_domains= ["zzz", "xxx", "ccc"]
internet_safe_domains

what do you think? https://github.com/faker-ruby/faker/pull/2506

edit, sorry this is a my another GitHub account, same as @saiqulhaq

Oct 15 '22 03:10 saiqulhaq-hh

I want to revisit this issue at its core before jumping to implementation ideas.

The real issue here is Faker generating real URL domain names and email addresses.

RFC2606 recommends using Reserved Top Level DNS Names as the best current practices for the internet community:

"To reduce the likelihood of conflict and confusion, a few top level domain names are reserved for use in private testing, as examples in documentation, and the like. Depending on the nature of the test or example, it might be best for it to be referencing a TLD permanently reserved for such purposes.

To safely satisfy these needs, four domain names are reserved as listed and described below.

 .test
.example
.invalid
.localhost

".test" is recommended for use in testing of current or new DNS related code.

".example" is recommended for use in documentation or as examples."

See RFC2606 for more details.

So the top-level domain recommended is .test for the generators and .example for documentation.

The option of disabling a "safe" configuration feels odd to me. We are either safe or not. We choose to follow the security recommendations.

And although Faker::Internet.safe_email says it generates an RFC 2606 compliant fake email, it is not entirely true because example. is an actual address that is being used and could be delivered. .test is the only reserved TLD for testing.

I realize this is not a popular choice, and the decision is not only mine to make, but I'd vote for only using .test as the TLD for all URLs and email addresses on Faker. In the docs, we guide the users not to use Faker URLs or email addresses since we would breach the security recommendations.

If users need to test with real users, I recommend using Faker to generate the fake usernames to append to their domain of choice.

We will leave that security choice to them, but as a library many people use, we don't want to ignore the security best practices.

We have "fake" in the name, after all. We shouldn't be generating real URLs and email addresses.

Nov 04 '22 23:11 stefannibrasil

yea agreed if we do that, then we should release a new major version for breaking change indication to faker's users

Nov 05 '22 02:11 saiqulhaq-hh

@stefannibrasil thanks for the clear explanation. I agree with what you said. And I think the solution will make faker easier to maintain too.

Nov 07 '22 21:11 fbuys

We just released v3 so maybe this one could be released in the beginning of next year :thinking:

We also need to add deprecation warnings to use in preparation for the breaking change.

I'd like to hear more from the other maintainers too, @vbrazo @thdaraujo @Zeragamba

Nov 08 '22 16:11 stefannibrasil

I think the next steps could be adding a deprecation warning in preparation for the release of Faker 4.0.

Nov 27 '22 01:11 stefannibrasil

Quick update: I'm working on adding the deprecation messages to go out on the next release.

Feb 07 '23 16:02 stefannibrasil

I've opened up the first PR here, constructive feedback is welcome:

https://github.com/faker-ruby/faker/pull/2709

After it's merged, the Changelog will announce why we are moving in this direction, and hopefully it will be a smooth transition for users.

Once we give enough time for users to switch to use the safe_email and safe_url only, faker 4 will be released with this breaking change.

Feb 11 '23 03:02 stefannibrasil

We came up with a new approach here: https://github.com/faker-ruby/faker/pull/2733

Mar 21 '23 02:03 stefannibrasil