convert-case
convert-case copied to clipboard
Wrong boundary detected when converting from camel case
Hi there, I really like you crate. However I came across a small issue. If the second char is uppercase in a camel case string, there should not be boundary inserted.
Here are two test cases:
failes with "foo_bar" != "f_oo_bar"
#[test]
fn test_camel_case() {
assert_eq!("foo_bar", "fOOBar".from_case(Case::Camel).to_case(Case::Snake))
}
succeeds
#[test]
fn test_pascal_case() {
assert_eq!("foo_bar", "FOOBar".from_case(Case::Camel).to_case(Case::Snake))
}
It's probably a quite niche bug and also easy to work around by just making the first letter always uppercase and then converting from pascal case instead.
Hello there,
This is not a bug, and is intended behavior. Recall that from_case
really just pulled a list of Boundary
that are commonly associated with that case. In camel case you would expect a lowercase followed by uppercase to be a boundary (aA), for example. There are also boundaries for digits as well. Luckily within convert_case you can actually easily see the associated boundaries for a case. Here are those for camel case.
println!("{:?}", Case::Camel.boundaries());
[LowerUpper, Acronym, LowerDigit, UpperDigit, DigitLower, DigitUpper]
We can also look at all the possible boundaries that can be identified in a provided string. Let's look at what is in your example strings.
println!("{:?}", Boundary::list_from("FOOBar"));
[Acronym]
println!("{:?}", Boundary::list_from("fOOBar");
[LowerUpper, Acronym]
FOOBar
contains the acronym boundary, and because that is in camel case's boundaries it is used a the point to split the string into words. It gets split to create FOO
and Bar
which are then combined into foo_bar
as snake case.
fOOBar
contains the acronym boundary AND the lowerupper boundary. This lowerupper boundary is at the first two characters fO
. This is also in camel case's list of boundaries so the string is split into f
and OO
and Bar
which is combined to f_oo_bar
as snake case.
This lowerupper boundary is expected for camel case, since that's how we join words. The end of one word is lowercase and the next begins with uppercase. In the case of fOOBar
, the first word is f
, followed by OOBar
.
All that is to say this is expected behavior.
Thanks for the answer,
I already dug a bit around the code and I understand that what happend is expected to happen based on the boundary logic. However I think in the case of fOOBar
the more correct way would be to ignore the first boundary / treat the first letter as uppercase.
It's probably an ugly patch to "fix" it (make it the way that I see more correct). And the user-side fix is quite easy to do.
The main motivation of this bug report was to make you aware of this and then maybe tell others who are looking for this that they have to manually fix it on their end.
Cheers