esapi-java-legacy icon indicating copy to clipboard operation
esapi-java-legacy copied to clipboard

Please improve efficiency of string-building code

Open meg23 opened this issue 10 years ago • 2 comments

From [email protected] on January 26, 2011 19:16:16

Please change java.lang.Character to String conversion to be more efficient. Specifically, instead of ""+c to build string, use String.valueOf(c). When I test on JDK 1.6.0_17, doing ""+c results in a new StringBuilder being unnecessarily created.

I noticed this, for example, in the HTMLEntityCodec.encodeCharacter(char[] immune, Character c). Instead, it could be written as

public String encodeCharacter( char[] immune, Character c ) {

    // check for immune characters
    if ( containsCharacter(c, immune ) ) {
        return String.valueOf(c);
    }

    // check for alphanumeric characters
    String hex = Codec.getHexForNonAlphanumeric(c);
    if ( hex == null ) {
        return String.valueOf(c);
    }

    // check for illegal characters
    if ( ( c <= 0x1f && c != '\t' && c != '\n' && c != '\r' ) || ( c >= 0x7f && c <= 0x9f ) )
    {
        hex = REPLACEMENT_HEX;  // Let's entity encode this instead of returning it
        c = REPLACEMENT_CHAR;
    }

    // check if there's a defined entity
    String entityName = (String) characterToEntityMap.get(c);
    if (entityName != null) {
        return "&" + entityName + ";";
    }

    // return the hex entity as suggested in the spec
    return "&#x" + hex + ";";
}

Along those lines, this could further be improved by changing the API of the encodeCharacter() so it could append to a provided StringBuilder. This way new Strings/StringBuilders don't need to be built up as characters are encoded. Although this would be needed on all the codecs, I did enclose the suggested code change for the base Codec class and the HTMLEntityCodec class.

All these extra objects being created, among other things, will be more work for the garbage collector. I wouldn't care too much if we only made occasional calls to it, but in the course of building up an HTML page, we could be making thousands of calls in to this method. Now look at that page being rendered under load and we're quickly in to the millions of invocations.

Thanks for your consideration,

Eric

Attachment: Codec.java HTMLEntityCodec.java

Original issue: http://code.google.com/p/owasp-esapi-java/issues/detail?id=199

meg23 avatar Nov 13 '14 18:11 meg23

From [email protected] on January 26, 2011 16:27:11

Actually, the code I attached is still doing more work than needed. If the StringBuilder is passed in to the encodeCharacter method, then String.valueOf() is unnecessary and overkill. The Character should be passed directly in to the StringBuilder. For example:

public StringBuilder encodeCharacterAndAppend(StringBuilder sb, char[] immune, Character c ) {

    // check for immune characters
    if ( containsCharacter(c, immune ) ) {
        return sb.append(c);
    }

    // check for alphanumeric characters
    String hex = Codec.getHexForNonAlphanumeric(c);
    if ( hex == null ) {
        return sb.append(c);
    }

I've uploaded the new HTMLEntityCode with this change.

Attachment: HTMLEntityCodec.java

meg23 avatar Nov 13 '14 18:11 meg23

From [email protected] on November 13, 2014 10:13:29

Labels: Type-Defect

meg23 avatar Nov 13 '14 18:11 meg23