commons-text
commons-text copied to clipboard
TEXT-217: Add SnakeCase Parsing
This implementation adds a CasedString class that can convert between several different formats.
Initially supported formats:
- Camel case identifies strings like 'CamelCase'.
- Snake case identifies strings like 'Snake_Case'
- Kebab case identifies strings like 'kebab-case'
- Phrase case identifies phrases of words like 'phrase case'
- Dot case identifies strings of words like 'dot.case'
CasedString does not convert the character case except where mandated by the case. so SnakeCase converted to kabob case is Snake-Case and kabob-case converted to snake case is kabobCase
Other utilities are available to modify the characte case.
For me, camel case starts with a lower case like a Java method name or an Open API key.
The test does not make it easy to understand what is converted to what or what supported round-trips are supported.
Explicit tests are helpful as documentation.
The test does a cross product of cases.
I find that using WordUtils.capitalise() and WordUitls.uncapitalise() can address the capitalisation SnakeCase issues. I found I need to generate both Java Class names and argument names.
My strategy here is take what was originally provided and use that. If character case changes are required the user can do so in the calling code.
Adding a pair of methods to get the segments as a String[] and to take a String[] and create the phrase might be a useful addition.
Please see my reply in TEXT-217
The tests are too obtuse from my POV and there is zero Javadoc to set user expectations (see link above). I'd like to see something we can point to users like:
assertToCamelCase("MyJavaMethodName", "myJavaMethodName");
@Claudenw Please set this PR to Draft while we are still discussing it :-)
I haven't downloaded this PR yet to take a good look at it yet. Probably Monday. I'll pull it and run a few aggressive tests.
As far as conventions for camelCase, PascalCase, etc. there is a pretty good list I developed for a raw edit of CaseUtils in PR 528 of conventions, not opinions on what they should or should not be. camelCase should always start in lower case, PascalCase should always start with upper case if converting to ASCII, or title case (as defined by Unicode) if retaining UTF encodings.
My PR tries to convert everything to lower ASCII that can be converted.
dot case and phrase case are not included in my PR but are easily obtained through delimiters.
@theshoesiner took a completely different approach in PR 450 that retains Unicode characters instead of converting to lower ASCII Latin.
I'd thought of merging our two PRs and seeing if we could make something of it with an option to retain Unicode or convert to lower ASCII.
I haven't looked at yours yet, but I did write some good JUnit tests I can use on it to see how she handles.
As I said, I'll give that a shot sometime Monday.