faker
faker copied to clipboard
Add phone.number generation options
Clear and concise description of the problem
Currently the phone.number method generates phone numbers in various formats. However, the usually the phone number inputs require specific input formats e.g. an international telephone number.
Suggested solution
It would be nice if it is possible to be more specific which addresses should be generated.
-
type?: 'landline' | 'cell-phone'
-
format?: 'international' | 'national'
It is not a goal to generate phone numbers that are normalized beyond the given options. So the user might still have to strip whitespace or other "human" additions.
Alternative
- https://github.com/faker-js/faker/issues/1541
Additional context
No response
We actually have a real-world issue related to this:
We use react-phone-number-input with a restriction for country=US
, but faker.phone.number()
can't be configured easily to give us US locale only.
Therefore, we encounter this warning within our unit tests:
console.error
[react-phone-number-input] Expected phone number +17427794379 to correspond to country US but in reality it corresponds to country CA.
We actually have a real-world issue related to this:
We use react-phone-number-input with a restriction for
country=US
, butfaker.phone.number()
can't be configured easily to give us US locale only.Therefore, we encounter this warning within our unit tests:
console.error [react-phone-number-input] Expected phone number +17427794379 to correspond to country US but in reality it corresponds to country CA.
if you do faker.locale = 'en_US'
, i think you should only get US area codes, not including 742? https://github.com/faker-js/faker/blob/next/src/locales/en_US/phone_number/area_code.ts
I guess I can just wait for https://github.com/faker-js/faker/pull/1735 to be merged, which makes en_US
the default and kicks out all other locales?
There's a difference between the default locale en
which is generic for worldwide English, and en_US
which is specific for the United States, #1735 won't change that.
So en
if you want a mix of phone numbers from different countries, en_US
for US and en_GB
for Great Britain, etc.
Hmm, the problem is we use faker.phone.number('+1##########')
currently which then doesn't care about the locale set.
When switching to use faker.phone.number()
, all kinds of formats are generated like 860.297.1343
or 572-614-7168 x1623
as described in the original issue description, which then doesn't work with our code.
When switching to use
faker.phone.number()
, all kinds of formats are generated like860.297.1343
or572-614-7168 x1623
as described in the original issue description, which then doesn't work with our code.
This issue is exactly there for that (to add options to get only the expected formats without manually passing it to the method).
Yeah, perfect! Then I will eagerly await the new functionality ;)
I can also offer to test changes out with our codebase if you guys have the need for it.
It is not a goal to generate phone numbers that are normalized beyond the given options
perhaps format = national
should generate "messy" numbers with various spaces, hyphens, extensions, etc, but format=international
should only generate numbers in the standard https://en.wikipedia.org/wiki/E.123 notation without spaces ie +{COUNTRYCODE}{NATIONALNUMBER} as that is a common way to store phone numbers in a database?
but
format=international
should only generate numbers in the standard en.wikipedia.org/wiki/E.123 notation without spaces ie +{COUNTRYCODE}{NATIONALNUMBER} as that is a common way to store phone numbers in a database?
This might be how you write it to your database, but that is not how your users will put it into your input fields.
(I remember some big platform, I don't remember which one (google, github or ...) requested me to input my number with a leading 0 even if they showed the +12 country selector at the front.)
Should we add a clean
or similar parameter that can be used to strip these "formatting" characters?
That would work too.
Team Decision
- We want this, but we will decide later how the actual implementation will work.
- This will effectively merge phone and cell phone locale data.
We might add a parameter that allows choosing whether the number should follow the standard or was "input" by a user.
It would be nice if at least a few options were available before the deprecation took effect.
We have an simple use case like faker.phone.number('##########')
, which I think would be hard to replicate until this is complete?
It would be nice if at least a few options were available before the deprecation took effect. We have an simple use case like
faker.phone.number('##########')
, which I think would be hard to replicate until this is complete?
the direct replacement for faker.phone.number('##########')
is faker.helpers.replaceSymbolWithNumber('##########')
- maybe the deprecation message should be modified to make that clearer?
maybe the deprecation message should be modified to make that clearer?
Yes, that would be helpful, if it's not too niche, or link to more docs somewhere?
Will the behavior differ at all between those two (in terms of which numbers can show up in which positions)? I'd thought of using string methods on the number, but that seems messier, and I don't like seeing the deprecation errors.
Nope, if you pass a format parameter to faker.phone.number it directly passes it to the replaceSymbolWithNumber method at the moment
For the original issue, i think the new API should be kept fairly minimal to start with, given the number of possible different "axes" which would make this hard to localize and test:
- cleanliness ie do you want raw numbers like +XXXXXXX, consistently formatted like (XXX) XXX-XXXX or a variety of human/messy formats
- include or dont include country codes or both?
- include or dont include extensions?
- include mobile numbers, landline numbers or both (and in some locales you can't tell the difference e.g. en-US)
So maybe there's only a single parameter like "style" = human | raw | national
initially which maps to the combination of these which we think are most likely to be useful, and each of those points to a seperate definition file, e.g.
- human (default) has a big mix of formats, with and without country codes, extensions, dashes and punctuation. This is useful if you are testing user input.
- raw is always in international format with no spaces or extensions e.g. +XXXXXXXXX. This is useful if you are inserting directly into a database
- national is in a standardized national format for the locale ie assumes that the data has been preprocessed e.g. 0XX XXX-XXXX. This is useful if you are showing in a mockup.
Then if users say "hey we really need a new style for this use case" it can be added without worrying too much about breaking existing functionality, you would just need to add a new definition file for the style.
It would be nice if at least a few options were available before the deprecation took effect. We have an simple use case like
faker.phone.number('##########')
, which I think would be hard to replicate until this is complete?
IMO this patter can be very easily generated by faker.string.numeric(9)
or is that just an example and the pattern is more complex?
the direct replacement for
faker.phone.number('##########')
isfaker.helpers.replaceSymbolWithNumber('##########')
IIRC that method is going to get deprecated as well. See #1994 for more details.
In some cases we have a +
as well, which works fine with the example above.
But the examples for a possible implementation above seem preferable if / when they’re implemented.
hmm, now i think about it, i think while saying that method "is mostly a black box that does some arbritary char -> digit transformations" is true, those are generally transformations which are useful for phone numbers. So maybe it would make sense to not deprecate it, for example if you want phone numbers in a non-standard but fixed pattern say (!##) !##-####
that would be quite hard to achieve with other methods.
for example if you want phone numbers in a non-standard but fixed pattern say
(!##) !##-####
that would be quite hard to achieve with other methods.
The issue there is that !
is tailored specifically for the US usage. Other locales have other needs.
It can easily achieved with js string.replace, faker.string.numeric, or faker.helpers.fromRegexp. You can find more details about them here: https://fakerjs.dev/api/
sure, but i would say there is no hurry to deprecate replaceSymbolWithNumber at least until we have more flexible options for phone.number() - that way anyone who is currently using a fixed format pattern has the option to either switch to one of the new patterns, or build their own.
Anyway this discussion is a bit offtopic, lets discuss deprecation further on https://github.com/faker-js/faker/issues/1994 this issue can be for the future options for the phone.number method.
It would be nice if at least a few options were available before the deprecation took effect. We have an simple use case like
faker.phone.number('##########')
, which I think would be hard to replicate until this is complete?the direct replacement for
faker.phone.number('##########')
isfaker.helpers.replaceSymbolWithNumber('##########')
- maybe the deprecation message should be modified to make that clearer?
Sorry, but the two methods are not the same, replacing them in our application results in tests failures.
Well, AFAICT they are:
https://github.com/faker-js/faker/blob/5e900d2df2e17be58991f3d0f2290ed4cd8a33ff/src/modules/phone/index.ts#L36-L43
Unless you don't always provide the format or the JS runtime processes the fallback path even though it doesn't have to.
the direct replacement for
faker.phone.number('##########')
isfaker.helpers.replaceSymbolWithNumber('##########')
- maybe the deprecation message should be modified to make that clearer?Sorry, but the two methods are not the same, replacing them in our application results in tests failures.
Seems identical for a given seed:
faker.seed(123); faker.helpers.replaceSymbolWithNumber('##########')
// 6724265774
faker.seed(123); faker.phone.number('##########')
// 6724265774
For the original issue, i think the new API should be kept fairly minimal to start with, given the number of possible different "axes" which would make this hard to localize and test:
- cleanliness ie do you want raw numbers like +XXXXXXX, consistently formatted like (XXX) XXX-XXXX or a variety of human/messy formats
- include or dont include country codes or both?
- include or dont include extensions?
- include mobile numbers, landline numbers or both (and in some locales you can't tell the difference e.g. en-US)
So maybe there's only a single parameter like "style" =
human | raw | national
initially which maps to the combination of these which we think are most likely to be useful, and each of those points to a seperate definition file, e.g.
- human (default) has a big mix of formats, with and without country codes, extensions, dashes and punctuation. This is useful if you are testing user input.
- raw is always in international format with no spaces or extensions e.g. +XXXXXXXXX. This is useful if you are inserting directly into a database
- national is in a standardized national format for the locale ie assumes that the data has been preprocessed e.g. 0XX XXX-XXXX. This is useful if you are showing in a mockup.
Then if users say "hey we really need a new style for this use case" it can be added without worrying too much about breaking existing functionality, you would just need to add a new definition file for the style.
Comparing to https://www.npmjs.com/package/libphonenumber-js
-
human
would be like we have now -
national
would be like the result offormatNational()
-
raw
would be like the result of.getURI()
, but without thetel:
prefix
My use case is mostly providing default data from my database models so for me it would be nice to get a "parsed" phone number and only mobile numbers. I really like the "no arguments" approach with faker so something like this would be nice for me.
faker.phone.mobile(); // only mobile numbers
faker.phone.landline(); // landline
faker.phone.number(); // mix of everything
I can see the use case for the "human input" format as well, but that is currently not something I use it for. But an argument with "style" I guess could work. Guess that is applicable to many different things. I looked in the documentation for amount for example, which also is prone to many different input formats.
But as I said, my use case is mostly for output so I would want a consistent format, but it would be nice to be able to generate "input" data as well.
faker.phone.mobile
Just FYI, there's no way in the North American Numbering Plan to distinguish between mobile and landlines. https://en.wikipedia.org/wiki/List_of_North_American_Numbering_Plan_area_codes
So maybe there's only a single parameter like "style" =
human | raw | national
initially which maps to the combination of these which we think are most likely to be useful, and each of those points to a seperate definition file, e.g.
human (default) has a big mix of formats, with and without country codes, extensions, dashes and punctuation. This is useful if you are testing user input.
raw is always in international format with no spaces or extensions e.g. +XXXXXXXXX. This is useful if you are inserting directly into a database
national is in a standardized national format for the locale ie assumes that the data has been preprocessed e.g. 0XX XXX-XXXX. This is useful if you are showing in a mockup.
Then if users say "hey we really need a new style for this use case" it can be added without worrying too much about breaking existing functionality, you would just need to add a new definition file for the style.
Would people be OK with me proceeding with a PR to introduce this "style" parameter?