datafaker
datafaker copied to clipboard
Improve the quality of phone numbers generated
At the moment, the quality of some generated phone numbers is quite poor. To demonstrate this, there's a test called PhoneNumberValidityFinderTest.testAllPhoneNumbers
, which generates the following output:
en_NZ=53
fr_CH=57
lv_LV=60
pt_PT=66
en_MS=69
tr_TR=84
zh_CN=86
hu_HU=93
en_PH=93
by_BY=94
ar_AR=97
no_NO=99
zh_TW=100
sv_SV=100
es_AR=100
bg_BG=100
This means that for the locales zh_TW
, sv_SV
, es_AR
and bg_BG
currently all phone numbers are generated incorrectly.
It would be great if we can improve the quality here a little bit.
Bulgarian (BG) phone numbers are fixed now.
no_NO fixed now too.
@bodiam Should we divide each locale problem to different issues? As for me, it would be better to solve this big problem one-by-one.
You're probably right, but github is a bit limiting. I thought it would be fine to just put a comment here. I didn't really want to create ~30 issues just for the phone numbers.
Current state of PhoneNumberValidityFinderTest.testAllPhoneNumbers
test:
sk_SK=65
lv_LV=72
fr_CH=73
en_MS=74
pt_PT=76
tr_TR=84
zh_CN=89
hu_HU=94
by_BY=95
ar_AR=97
es_AR=98
sv_SV=100
@bodiam sv_SV
is this the correct one? Because I cannot find any information about it. Maybe it's sv_SE
?
@panilya I'm not 100% sure. There's a script which generates this output, there might be an issue in the script. I know there was some mixup or so in the locales, but not 100% sure, sorry!
@bodiam List of locales. Cannot find the sv_SV
locale, I will try to fix this later.
Current state of
PhoneNumberValidityFinderTest.testAllPhoneNumbers
test:sk_SK=65 lv_LV=72 fr_CH=73 en_MS=74 pt_PT=76 tr_TR=84 zh_CN=89 hu_HU=94 by_BY=95 ar_AR=97 es_AR=98 sv_SV=100
es-AR
and zh-CN
are fixed now
Current state of
PhoneNumberValidityFinderTest.testAllPhoneNumbers
test:sk_SK=65 lv_LV=72 fr_CH=73 en_MS=74 pt_PT=76 tr_TR=84 zh_CN=89 hu_HU=94 by_BY=95 ar_AR=97 es_AR=98 sv_SV=100
es-AR
andzh-CN
are fixed now
en-MS
and en-NZ
are fixed
Update:
en_AU=53
pt_PT=60
sk_SK=61
lv_LV=65
fr_CH=72
tr_TR=86
by_BY=94
hu_HU=96
ar_AR=99
sv_SV=100
Current state of
PhoneNumberValidityFinderTest.testAllPhoneNumbers
test:sk_SK=65 lv_LV=72 fr_CH=73 en_MS=74 pt_PT=76 tr_TR=84 zh_CN=89 hu_HU=94 by_BY=95 ar_AR=97 es_AR=98 sv_SV=100
es-AR
andzh-CN
are fixed now
en-MS
anden-NZ
are fixedUpdate:
en_AU=53 pt_PT=60 sk_SK=61 lv_LV=65 fr_CH=72 tr_TR=86 by_BY=94 hu_HU=96 ar_AR=99 sv_SV=100
Update:
de_AT=55
en_AU=59
sk_SK=62
lv_LV=64
hu_HU=95
ar_AR=99
sv_SV=100
@bodiam At that moment, there are only 2 locales with error code > 50 - ar_AR
, sv_SV
, these two locales are not countries, it's just languages, so what should I do with this?
@panilya how about we just close the ticket? It has improved quite a bit, so to me it's good enough.
@bodiam I wanted to do it. There are like +-4-5 locales with error code > 35. Maybe open another issue for these locales or it's better to not?
@bodiam I have a question not related to this issue. You have mentioned a tool for generation providers.md
, I work on it and have questions about implementation. How to generate a table in .md
format? Am I allowed to install a new library or try to write some custom implementation for generating tables? Also, how to get Since
field? Use Reflection API or something related to it?
@bodiam I wanted to do it. There are like +-4-5 locales with error code > 35. Maybe open another issue for these locales or it's better to not?
I think you can lower the error threshold, and use the same issue if you want?
@bodiam I have a question not related to this issue. You have mentioned a tool for generation
providers.md
, I work on it and have questions about implementation. How to generate a table in.md
format? Am I allowed to install a new library or try to write some custom implementation for generating tables? Also, how to getSince
field? Use Reflection API or something related to it?
In regards to new libraries, I'd like to keep the number of libraries used for the Datafaker core to an absolute minimum, but for tools like this, this is not applicable, since they won't get distributed as part of the Datafaker-core anyway. This means that you can use any tool, library, framework, etc you see fit, no objection at all.
What I had in mind myself for the providers.md
generation was that I parse the source files, and create an output .md
table with some printlns. Really basic stuff, but the parsing of the source files was a bit harder (for example, I'd like to get the @since
and the Javadoc, but sometimes the Javadoc is multiline, or there's none, and it just made parsing a bit harder than I had the time for).
You could also use Reflection API, if that's easier, but I considered to use something like JavaParser or so, but in reality, it doesn't matter, as long as we have a small tool to generate the markdown file, I'm happy to manually copy and paste it into the provders.md
file, so it can even be a bash file if that would make sense.
@bodiam I could just add any library in pom.xml? And another question, is it okay to place the file in test.java.net.datafaker.script
?
@panilya You can add any library to the pom.xml
as long as it has a <scope>test</scope>
, and yes, that seems like a good location (and we can always move it later if it's not, no problem)
@bodiam For me, phone number formats improved enough. If you think the same, you can close the issue
Awesome! Great improvements there, thank you!