datafaker Improve the quality of phone numbers generated

At the moment, the quality of some generated phone numbers is quite poor. To demonstrate this, there's a test called PhoneNumberValidityFinderTest.testAllPhoneNumbers, which generates the following output:

en_NZ=53
fr_CH=57
lv_LV=60
pt_PT=66
en_MS=69
tr_TR=84
zh_CN=86
hu_HU=93
en_PH=93
by_BY=94
ar_AR=97
no_NO=99
zh_TW=100
sv_SV=100
es_AR=100
bg_BG=100

This means that for the locales zh_TW, sv_SV, es_AR and bg_BG currently all phone numbers are generated incorrectly.

It would be great if we can improve the quality here a little bit.

May 20 '22 02:05 bodiam

Bulgarian (BG) phone numbers are fixed now.

May 20 '22 03:05 bodiam

no_NO fixed now too.

May 20 '22 13:05 bodiam

@bodiam Should we divide each locale problem to different issues? As for me, it would be better to solve this big problem one-by-one.

May 22 '22 16:05 divinenickname

You're probably right, but github is a bit limiting. I thought it would be fine to just put a comment here. I didn't really want to create ~30 issues just for the phone numbers.

May 22 '22 22:05 bodiam

Current state of PhoneNumberValidityFinderTest.testAllPhoneNumbers test:

sk_SK=65
lv_LV=72
fr_CH=73
en_MS=74
pt_PT=76
tr_TR=84
zh_CN=89
hu_HU=94
by_BY=95
ar_AR=97
es_AR=98
sv_SV=100

Jul 21 '22 08:07 panilya

@bodiam sv_SV is this the correct one? Because I cannot find any information about it. Maybe it's sv_SE?

Jul 21 '22 08:07 panilya

@panilya I'm not 100% sure. There's a script which generates this output, there might be an issue in the script. I know there was some mixup or so in the locales, but not 100% sure, sorry!

Jul 21 '22 10:07 bodiam

@bodiam List of locales. Cannot find the sv_SV locale, I will try to fix this later.

Jul 21 '22 10:07 panilya

Current state of PhoneNumberValidityFinderTest.testAllPhoneNumbers test:
sk_SK=65
lv_LV=72
fr_CH=73
en_MS=74
pt_PT=76
tr_TR=84
zh_CN=89
hu_HU=94
by_BY=95
ar_AR=97
es_AR=98
sv_SV=100

es-AR and zh-CN are fixed now

Jul 21 '22 15:07 panilya

Current state of PhoneNumberValidityFinderTest.testAllPhoneNumbers test:
sk_SK=65
lv_LV=72
fr_CH=73
en_MS=74
pt_PT=76
tr_TR=84
zh_CN=89
hu_HU=94
by_BY=95
ar_AR=97
es_AR=98
sv_SV=100
es-AR and zh-CN are fixed now

en-MS and en-NZ are fixed

Update:

en_AU=53
pt_PT=60
sk_SK=61
lv_LV=65
fr_CH=72
tr_TR=86
by_BY=94
hu_HU=96
ar_AR=99
sv_SV=100

Jul 22 '22 15:07 panilya

Current state of PhoneNumberValidityFinderTest.testAllPhoneNumbers test:
sk_SK=65
lv_LV=72
fr_CH=73
en_MS=74
pt_PT=76
tr_TR=84
zh_CN=89
hu_HU=94
by_BY=95
ar_AR=97
es_AR=98
sv_SV=100
es-AR and zh-CN are fixed now
en-MS and en-NZ are fixed

Update:
en_AU=53
pt_PT=60
sk_SK=61
lv_LV=65
fr_CH=72
tr_TR=86
by_BY=94
hu_HU=96
ar_AR=99
sv_SV=100

Update:

de_AT=55
en_AU=59
sk_SK=62
lv_LV=64
hu_HU=95
ar_AR=99
sv_SV=100

Aug 09 '22 08:08 panilya

@bodiam At that moment, there are only 2 locales with error code > 50 - ar_AR, sv_SV, these two locales are not countries, it's just languages, so what should I do with this?

Aug 13 '22 19:08 panilya

@panilya how about we just close the ticket? It has improved quite a bit, so to me it's good enough.

Aug 13 '22 20:08 bodiam

@bodiam I wanted to do it. There are like +-4-5 locales with error code > 35. Maybe open another issue for these locales or it's better to not?

Aug 13 '22 20:08 panilya

@bodiam I have a question not related to this issue. You have mentioned a tool for generation providers.md, I work on it and have questions about implementation. How to generate a table in .md format? Am I allowed to install a new library or try to write some custom implementation for generating tables? Also, how to get Since field? Use Reflection API or something related to it?

Aug 14 '22 21:08 panilya

@bodiam I wanted to do it. There are like +-4-5 locales with error code > 35. Maybe open another issue for these locales or it's better to not?

I think you can lower the error threshold, and use the same issue if you want?

Aug 14 '22 22:08 bodiam

@bodiam I have a question not related to this issue. You have mentioned a tool for generation providers.md, I work on it and have questions about implementation. How to generate a table in .md format? Am I allowed to install a new library or try to write some custom implementation for generating tables? Also, how to get Since field? Use Reflection API or something related to it?

In regards to new libraries, I'd like to keep the number of libraries used for the Datafaker core to an absolute minimum, but for tools like this, this is not applicable, since they won't get distributed as part of the Datafaker-core anyway. This means that you can use any tool, library, framework, etc you see fit, no objection at all.

What I had in mind myself for the providers.md generation was that I parse the source files, and create an output .md table with some printlns. Really basic stuff, but the parsing of the source files was a bit harder (for example, I'd like to get the @since and the Javadoc, but sometimes the Javadoc is multiline, or there's none, and it just made parsing a bit harder than I had the time for).

You could also use Reflection API, if that's easier, but I considered to use something like JavaParser or so, but in reality, it doesn't matter, as long as we have a small tool to generate the markdown file, I'm happy to manually copy and paste it into the provders.md file, so it can even be a bash file if that would make sense.

Aug 14 '22 22:08 bodiam

@bodiam I could just add any library in pom.xml? And another question, is it okay to place the file in test.java.net.datafaker.script?

Aug 14 '22 22:08 panilya

@panilya You can add any library to the pom.xml as long as it has a <scope>test</scope>, and yes, that seems like a good location (and we can always move it later if it's not, no problem)

Aug 14 '22 23:08 epragtbeamtree

@bodiam For me, phone number formats improved enough. If you think the same, you can close the issue

Aug 23 '22 17:08 panilya

Awesome! Great improvements there, thank you!

Aug 23 '22 21:08 bodiam

datafaker datafaker copied to clipboard

Improve the quality of phone numbers generated

datafaker
datafaker copied to clipboard