faker icon indicating copy to clipboard operation
faker copied to clipboard

Add phone.number generation options

Open ST-DDT opened this issue 2 years ago • 31 comments

Clear and concise description of the problem

Currently the phone.number method generates phone numbers in various formats. However, the usually the phone number inputs require specific input formats e.g. an international telephone number.

Suggested solution

It would be nice if it is possible to be more specific which addresses should be generated.

  • type?: 'landline' | 'cell-phone'
  • format?: 'international' | 'national'

It is not a goal to generate phone numbers that are normalized beyond the given options. So the user might still have to strip whitespace or other "human" additions.

Alternative

  • https://github.com/faker-js/faker/issues/1541

Additional context

No response

ST-DDT avatar Nov 07 '22 23:11 ST-DDT

We actually have a real-world issue related to this:

We use react-phone-number-input with a restriction for country=US, but faker.phone.number() can't be configured easily to give us US locale only.

Therefore, we encounter this warning within our unit tests:

console.error
      [react-phone-number-input] Expected phone number +17427794379 to correspond to country US but in reality it corresponds to country CA.

mcmxcdev avatar Feb 28 '23 08:02 mcmxcdev

We actually have a real-world issue related to this:

We use react-phone-number-input with a restriction for country=US, but faker.phone.number() can't be configured easily to give us US locale only.

Therefore, we encounter this warning within our unit tests:

console.error
      [react-phone-number-input] Expected phone number +17427794379 to correspond to country US but in reality it corresponds to country CA.

if you do faker.locale = 'en_US', i think you should only get US area codes, not including 742? https://github.com/faker-js/faker/blob/next/src/locales/en_US/phone_number/area_code.ts

matthewmayer avatar Feb 28 '23 08:02 matthewmayer

I guess I can just wait for https://github.com/faker-js/faker/pull/1735 to be merged, which makes en_US the default and kicks out all other locales?

mcmxcdev avatar Feb 28 '23 08:02 mcmxcdev

There's a difference between the default locale en which is generic for worldwide English, and en_US which is specific for the United States, #1735 won't change that.

So en if you want a mix of phone numbers from different countries, en_US for US and en_GB for Great Britain, etc.

matthewmayer avatar Feb 28 '23 08:02 matthewmayer

Hmm, the problem is we use faker.phone.number('+1##########') currently which then doesn't care about the locale set.

When switching to use faker.phone.number(), all kinds of formats are generated like 860.297.1343 or 572-614-7168 x1623 as described in the original issue description, which then doesn't work with our code.

mcmxcdev avatar Feb 28 '23 15:02 mcmxcdev

When switching to use faker.phone.number(), all kinds of formats are generated like 860.297.1343 or 572-614-7168 x1623 as described in the original issue description, which then doesn't work with our code.

This issue is exactly there for that (to add options to get only the expected formats without manually passing it to the method).

ST-DDT avatar Feb 28 '23 20:02 ST-DDT

Yeah, perfect! Then I will eagerly await the new functionality ;)

I can also offer to test changes out with our codebase if you guys have the need for it.

mcmxcdev avatar Mar 01 '23 06:03 mcmxcdev

It is not a goal to generate phone numbers that are normalized beyond the given options

perhaps format = national should generate "messy" numbers with various spaces, hyphens, extensions, etc, but format=international should only generate numbers in the standard https://en.wikipedia.org/wiki/E.123 notation without spaces ie +{COUNTRYCODE}{NATIONALNUMBER} as that is a common way to store phone numbers in a database?

matthewmayer avatar Mar 01 '23 09:03 matthewmayer

but format=international should only generate numbers in the standard en.wikipedia.org/wiki/E.123 notation without spaces ie +{COUNTRYCODE}{NATIONALNUMBER} as that is a common way to store phone numbers in a database?

This might be how you write it to your database, but that is not how your users will put it into your input fields. (I remember some big platform, I don't remember which one (google, github or ...) requested me to input my number with a leading 0 even if they showed the +12 country selector at the front.) Should we add a clean or similar parameter that can be used to strip these "formatting" characters?

ST-DDT avatar Mar 01 '23 23:03 ST-DDT

That would work too.

matthewmayer avatar Mar 02 '23 00:03 matthewmayer

Team Decision

  • We want this, but we will decide later how the actual implementation will work.
  • This will effectively merge phone and cell phone locale data.

We might add a parameter that allows choosing whether the number should follow the standard or was "input" by a user.

ST-DDT avatar Mar 16 '23 17:03 ST-DDT

It would be nice if at least a few options were available before the deprecation took effect. We have an simple use case like faker.phone.number('##########'), which I think would be hard to replicate until this is complete?

wyardley avatar Sep 22 '23 04:09 wyardley

It would be nice if at least a few options were available before the deprecation took effect. We have an simple use case like faker.phone.number('##########'), which I think would be hard to replicate until this is complete?

the direct replacement for faker.phone.number('##########') is faker.helpers.replaceSymbolWithNumber('##########') - maybe the deprecation message should be modified to make that clearer?

matthewmayer avatar Sep 22 '23 05:09 matthewmayer

maybe the deprecation message should be modified to make that clearer?

Yes, that would be helpful, if it's not too niche, or link to more docs somewhere?

Will the behavior differ at all between those two (in terms of which numbers can show up in which positions)? I'd thought of using string methods on the number, but that seems messier, and I don't like seeing the deprecation errors.

wyardley avatar Sep 22 '23 05:09 wyardley

Nope, if you pass a format parameter to faker.phone.number it directly passes it to the replaceSymbolWithNumber method at the moment

matthewmayer avatar Sep 22 '23 05:09 matthewmayer

For the original issue, i think the new API should be kept fairly minimal to start with, given the number of possible different "axes" which would make this hard to localize and test:

  • cleanliness ie do you want raw numbers like +XXXXXXX, consistently formatted like (XXX) XXX-XXXX or a variety of human/messy formats
  • include or dont include country codes or both?
  • include or dont include extensions?
  • include mobile numbers, landline numbers or both (and in some locales you can't tell the difference e.g. en-US)

So maybe there's only a single parameter like "style" = human | raw | national initially which maps to the combination of these which we think are most likely to be useful, and each of those points to a seperate definition file, e.g.

  • human (default) has a big mix of formats, with and without country codes, extensions, dashes and punctuation. This is useful if you are testing user input.
  • raw is always in international format with no spaces or extensions e.g. +XXXXXXXXX. This is useful if you are inserting directly into a database
  • national is in a standardized national format for the locale ie assumes that the data has been preprocessed e.g. 0XX XXX-XXXX. This is useful if you are showing in a mockup.

Then if users say "hey we really need a new style for this use case" it can be added without worrying too much about breaking existing functionality, you would just need to add a new definition file for the style.

matthewmayer avatar Sep 22 '23 05:09 matthewmayer

It would be nice if at least a few options were available before the deprecation took effect. We have an simple use case like faker.phone.number('##########'), which I think would be hard to replicate until this is complete?

IMO this patter can be very easily generated by faker.string.numeric(9) or is that just an example and the pattern is more complex?

ST-DDT avatar Sep 22 '23 06:09 ST-DDT

the direct replacement for faker.phone.number('##########') is faker.helpers.replaceSymbolWithNumber('##########')

IIRC that method is going to get deprecated as well. See #1994 for more details.

ST-DDT avatar Sep 22 '23 06:09 ST-DDT

In some cases we have a + as well, which works fine with the example above.

But the examples for a possible implementation above seem preferable if / when they’re implemented.

wyardley avatar Sep 22 '23 06:09 wyardley

hmm, now i think about it, i think while saying that method "is mostly a black box that does some arbritary char -> digit transformations" is true, those are generally transformations which are useful for phone numbers. So maybe it would make sense to not deprecate it, for example if you want phone numbers in a non-standard but fixed pattern say (!##) !##-#### that would be quite hard to achieve with other methods.

matthewmayer avatar Sep 22 '23 06:09 matthewmayer

for example if you want phone numbers in a non-standard but fixed pattern say (!##) !##-#### that would be quite hard to achieve with other methods.

The issue there is that ! is tailored specifically for the US usage. Other locales have other needs.

It can easily achieved with js string.replace, faker.string.numeric, or faker.helpers.fromRegexp. You can find more details about them here: https://fakerjs.dev/api/

ST-DDT avatar Sep 22 '23 06:09 ST-DDT

sure, but i would say there is no hurry to deprecate replaceSymbolWithNumber at least until we have more flexible options for phone.number() - that way anyone who is currently using a fixed format pattern has the option to either switch to one of the new patterns, or build their own.

matthewmayer avatar Sep 22 '23 07:09 matthewmayer

Anyway this discussion is a bit offtopic, lets discuss deprecation further on https://github.com/faker-js/faker/issues/1994 this issue can be for the future options for the phone.number method.

matthewmayer avatar Sep 22 '23 07:09 matthewmayer

It would be nice if at least a few options were available before the deprecation took effect. We have an simple use case like faker.phone.number('##########'), which I think would be hard to replicate until this is complete?

the direct replacement for faker.phone.number('##########') is faker.helpers.replaceSymbolWithNumber('##########') - maybe the deprecation message should be modified to make that clearer?

Sorry, but the two methods are not the same, replacing them in our application results in tests failures.

Amine27 avatar Sep 22 '23 14:09 Amine27

Well, AFAICT they are:

https://github.com/faker-js/faker/blob/5e900d2df2e17be58991f3d0f2290ed4cd8a33ff/src/modules/phone/index.ts#L36-L43

Unless you don't always provide the format or the JS runtime processes the fallback path even though it doesn't have to.

ST-DDT avatar Sep 22 '23 14:09 ST-DDT

the direct replacement for faker.phone.number('##########') is faker.helpers.replaceSymbolWithNumber('##########') - maybe the deprecation message should be modified to make that clearer?

Sorry, but the two methods are not the same, replacing them in our application results in tests failures.

Seems identical for a given seed:

faker.seed(123); faker.helpers.replaceSymbolWithNumber('##########')
// 6724265774

faker.seed(123); faker.phone.number('##########')
// 6724265774

matthewmayer avatar Sep 22 '23 15:09 matthewmayer

For the original issue, i think the new API should be kept fairly minimal to start with, given the number of possible different "axes" which would make this hard to localize and test:

  • cleanliness ie do you want raw numbers like +XXXXXXX, consistently formatted like (XXX) XXX-XXXX or a variety of human/messy formats
  • include or dont include country codes or both?
  • include or dont include extensions?
  • include mobile numbers, landline numbers or both (and in some locales you can't tell the difference e.g. en-US)

So maybe there's only a single parameter like "style" = human | raw | national initially which maps to the combination of these which we think are most likely to be useful, and each of those points to a seperate definition file, e.g.

  • human (default) has a big mix of formats, with and without country codes, extensions, dashes and punctuation. This is useful if you are testing user input.
  • raw is always in international format with no spaces or extensions e.g. +XXXXXXXXX. This is useful if you are inserting directly into a database
  • national is in a standardized national format for the locale ie assumes that the data has been preprocessed e.g. 0XX XXX-XXXX. This is useful if you are showing in a mockup.

Then if users say "hey we really need a new style for this use case" it can be added without worrying too much about breaking existing functionality, you would just need to add a new definition file for the style.

Comparing to https://www.npmjs.com/package/libphonenumber-js

  • human would be like we have now
  • national would be like the result of formatNational()
  • raw would be like the result of .getURI(), but without the tel: prefix
image

matthewmayer avatar Sep 27 '23 02:09 matthewmayer

My use case is mostly providing default data from my database models so for me it would be nice to get a "parsed" phone number and only mobile numbers. I really like the "no arguments" approach with faker so something like this would be nice for me.

faker.phone.mobile(); // only mobile numbers
faker.phone.landline(); // landline
faker.phone.number(); // mix of everything

I can see the use case for the "human input" format as well, but that is currently not something I use it for. But an argument with "style" I guess could work. Guess that is applicable to many different things. I looked in the documentation for amount for example, which also is prone to many different input formats.

But as I said, my use case is mostly for output so I would want a consistent format, but it would be nice to be able to generate "input" data as well.

Gyran avatar Nov 02 '23 10:11 Gyran

faker.phone.mobile

Just FYI, there's no way in the North American Numbering Plan to distinguish between mobile and landlines. https://en.wikipedia.org/wiki/List_of_North_American_Numbering_Plan_area_codes

banshee avatar Dec 12 '23 23:12 banshee

So maybe there's only a single parameter like "style" = human | raw | national initially which maps to the combination of these which we think are most likely to be useful, and each of those points to a seperate definition file, e.g.

  • human (default) has a big mix of formats, with and without country codes, extensions, dashes and punctuation. This is useful if you are testing user input.

  • raw is always in international format with no spaces or extensions e.g. +XXXXXXXXX. This is useful if you are inserting directly into a database

  • national is in a standardized national format for the locale ie assumes that the data has been preprocessed e.g. 0XX XXX-XXXX. This is useful if you are showing in a mockup.

Then if users say "hey we really need a new style for this use case" it can be added without worrying too much about breaking existing functionality, you would just need to add a new definition file for the style.

Would people be OK with me proceeding with a PR to introduce this "style" parameter?

matthewmayer avatar Dec 13 '23 11:12 matthewmayer