guava CaseFormat to detect format of a string (value)

It's nice to have a function to test whether a particular string is in which case format.

In a tool application, a string will provided by external code. And if the value matches some known case format, It will then be normalised to a standard format. e.g. if input matches UPPER_UNDERSCORE and LOWER_UNDERSCORE will both converted to LOWER_CAMEL.

Some ideas illustrated as below:

/**
 * Tests which CaseFormat the <code>value</code> is.
 *
 * @param value
 * @return the case format for the value if found. <code>null</code> if no matches found.
 */
@Nullable
public static CaseFormat test(@Nonnull String value) {
    // in worst case, may need to enumerate all format
    // some quick test, like first detect underscore then check cases etc may help.
    return CaseFormat.Test;
}

/**
 * Tests whether <code>value</code> is in this case format.
 *
 * @param value
 * @return <code>true</code> if <code>value</code> is matching case format.
 */
abstract boolean matches(@Nonnull String value);

Oct 31 '15 11:10 dopsun

Is "whatever" LOWER_HYPHEN or LOWER_CAMEL? Also it could be LOWER_UNDERSCORE.

Nov 02 '15 17:11 ineuwirth

The case I'm facing now is:

first, detect which case format the incoming identifier is. (This is what's requested here)
if it's known case format, but not target format, then convert it into target format. (this is existing feature)

Nov 03 '15 13:11 dopsun

I was just trying to highlight that CaseFormats could be ambiguous. Your new method test might return with LOWER_HYPHEN, LOWER_CAMEL or LOWER_UNDERSCORE for the input "whatever".

Nov 03 '15 13:11 ineuwirth

Ah, I get you now. Sorry for mis-understood your comments in first place.

Regarding the particular case whatever, since there is no HYPHEN and UNDERSCORE, then it should be returning LOWER_CAMEL. I do agree that there could be ambiguous, say it can return LOWER_ALL as well, if such a case format existing.

On the other hand, at least for a particular type, it can have a matches function, as drafted in my original post. In this case, my tool can enumerate supported types one by one, and naturally, there is an ordering as "one-by-one" goes in app code.

Also, even it's un-deterministic for static function included in my original post, e.g.LOWER_CAMEL and LOWER_ALL case, it either can return first one in natural order (un-deterministic), e.g. definition sequence of enum, or it may return a list of matching (deterministic). I'm not sure the other scenarios, but for my tool, either way works. Since requirement for my case is to find what's the current format, and if it's a known format, then convert it to a normalised format.

Nov 03 '15 14:11 dopsun

To argue successfully for a new feature, one thing we need is a good set of real-world use cases for which the feature is a clear win. Along with a reason to believe this need comes up with any reasonably frequency among our users. This one seems very unlikely to clear the bar, but if you feel strongly and can provide the requested info, you can reopen.

Nov 19 '15 20:11 kevinb9n

Thanks @kevinb9n

I may not open this issue. But I would like to provide more details about my case before leave this ticket rest in peace:

The current CaseFormat is from Known to Known conversion. My problem is from Unknown to Known. I have a data dictionary in XML format, and my tool generates code for Java and .Net. And Id inside the XML file, may follow Java convention (camel), or .Net (First letter capital), or lower_lower, or UPPER_UPPER. I need this feature to first tell me which known format it is, and if it's supported format, then generates Java/ .Net format id, as well as constant definition.

What I can see it could probably be useful for IDE code formatter/ refactoring tools, but again, I'm not doing any tools like this before, it may not be the case. And even these are valid cases, I am not sure whether it qualifies the criteria. Maybe not.

On the other hand, I cannot find any utility tools satisfy this need either. And that's the reason why I'm asking here.

Nov 20 '15 14:11 dopsun

+1 - similar use case to dopsun

Alternatively what we are asking is to convert to a given case, no matter what the input format is.

In the reasoning you have given above, it really does not matter as long as we are able to go from a unknown case to a known case. for example if "whatever" was provided and asked to convert to UPPER_CAMEL, it would return "Whatever" while for "whatEver" or "what_ever" will return "WhatEver"..

Nov 22 '15 23:11 vshank77

Hello,

In a similar case I have to validate if a given name for a location is camelCase or CamelCase and then convert it to LOWER_UNDERSCORE.

So I need to compare the return value of newLocation.getName() against something for identifying which case format is it, and then it is possible to convert from that format to the required LOWER_UNDERSCORE

Thanks a lot!!

Dec 02 '15 13:12 Cfsattva

+5

I don't really care what the current case format for a given incoming string is, but whatever it is, I want to convert it to a specific case format. I'm not sure what systems could make use of finding out what format a string currently is in, but I would think that there's a not-small class of software that would be interested in taking in a string from some external, possibly untrusted, source and converting it to a specific case format.

We have a system that uses a case format for different fields on a variety of data for a variety of scenarios (usually it's display scenarios we care about), and we are not always in control of the input.

The alternative is to run through a bunch of guesses on the source format and see which conversion to a target format yields an identical string. Only then can the conversion be made.

May 03 '17 22:05 chefhoobajoob

I just ran into a need for this as well, and we've had a few reports internally asking for something like this. I think the most reasonable API would be to have an instance method which returns true/false for whether a given string matches the CaseFormat.

Would that satisfy most use cases here?

Oct 31 '17 22:10 kluever

+1 This will help.

Nov 01 '17 13:11 dopsun

I just ran into this too so i created a method that would return the CaseFormat of a given string:

private CaseFormat getCaseFormatName(String s) throws IllegalFormatException {

        if (s.contains("_")) {

            if (s.toUpperCase().equals(s))
                return CaseFormat.UPPER_UNDERSCORE;

            if (s.toLowerCase().equals(s))
                return CaseFormat.LOWER_UNDERSCORE;

        } else if (s.contains("-")) {

            if (s.toLowerCase().equals(s))
                return CaseFormat.LOWER_HYPHEN;

        } else {

            if (Character.isLowerCase(s.charAt(0))) {

                if (s.matches("([a-z]+[A-Z]+\\w+)+"))
                    return CaseFormat.LOWER_CAMEL;

            } else {

                if (s.matches("([A-Z]+[a-z]+\\w+)+"))
                    return CaseFormat.UPPER_CAMEL;
            }
        }

        throw new IllegalArgumentException("Couldn't find the case format of the given string.");
    }

If anyone would fine it useful.

Jan 17 '18 16:01 AhmedMourad0

@AhmedMouradDev That one would throw when given whatever.

Sep 04 '19 09:09 whiskeysierra

I just ran into this too so i created a method that would return the CaseFormat of a given string:

private CaseFormat getCaseFormatName(String s) throws IllegalFormatException {

        if (s.contains("_")) {

            if (s.toUpperCase().equals(s))
                return CaseFormat.UPPER_UNDERSCORE;

            if (s.toLowerCase().equals(s))
                return CaseFormat.LOWER_UNDERSCORE;

        } else if (s.contains("-")) {

            if (s.toLowerCase().equals(s))
                return CaseFormat.LOWER_HYPHEN;

        } else {

            if (Character.isLowerCase(s.charAt(0))) {

                if (s.matches("([a-z]+[A-Z]+\\w+)+"))
                    return CaseFormat.LOWER_CAMEL;

            } else {

                if (s.matches("([A-Z]+[a-z]+\\w+)+"))
                    return CaseFormat.UPPER_CAMEL;
            }
        }

        throw new IllegalArgumentException("Couldn't find the case format of the given string.");
    }

If anyone would fine it useful.

Hello. s.matches("([A-Z]+[a-z]+\w+)+") failed to recognize situations when digit is located among two capital letters. Could you fix regexp for test case "C2ControlFeatureCategoryCodeType".

Oct 27 '20 15:10 EMaksymenko

This feature will also be helpful for my team.

We are managing various sources of data with different string format each and we would like to store data in our DB with a unique format.

The process should be the following: Source of Data 1 (format x) -> identify format -> transform to DEFAULT_FORMAT -> store in DB Source of Data 2 (format y) -> identify format -> transform to DEFAULT_FORMAT -> store in DB ...

Jan 05 '23 13:01 realfranser

Hello, I think what we want here, is just converting whatever format in a given awaited format. The most important would be to concentrate on the target format. If this just we can inspire of that python code of SublimeText project https://github.com/mitranim/sublime-caser/blob/master/sublime-caser.py

So I think that imposing the source format is not a good idea :thinking:

Jul 24 '23 11:07 pilak

Have a similar use case where the input string can be either kebab case of camel case and it needs to be converted into snake case. But what I see is that this issue has been opened since 2015 and the development team didn't think if it was worth looking into it?

May 08 '24 16:05 manpreetsoftodia

guava guava copied to clipboard

CaseFormat to detect format of a string (value)

guava
guava copied to clipboard