esapi-java-legacy
esapi-java-legacy copied to clipboard
Please improve efficiency of string-building code
From [email protected] on January 26, 2011 19:16:16
Please change java.lang.Character to String conversion to be more efficient. Specifically, instead of ""+c to build string, use String.valueOf(c). When I test on JDK 1.6.0_17, doing ""+c results in a new StringBuilder being unnecessarily created.
I noticed this, for example, in the HTMLEntityCodec.encodeCharacter(char[] immune, Character c). Instead, it could be written as
public String encodeCharacter( char[] immune, Character c ) {
// check for immune characters
if ( containsCharacter(c, immune ) ) {
return String.valueOf(c);
}
// check for alphanumeric characters
String hex = Codec.getHexForNonAlphanumeric(c);
if ( hex == null ) {
return String.valueOf(c);
}
// check for illegal characters
if ( ( c <= 0x1f && c != '\t' && c != '\n' && c != '\r' ) || ( c >= 0x7f && c <= 0x9f ) )
{
hex = REPLACEMENT_HEX; // Let's entity encode this instead of returning it
c = REPLACEMENT_CHAR;
}
// check if there's a defined entity
String entityName = (String) characterToEntityMap.get(c);
if (entityName != null) {
return "&" + entityName + ";";
}
// return the hex entity as suggested in the spec
return "&#x" + hex + ";";
}
Along those lines, this could further be improved by changing the API of the encodeCharacter() so it could append to a provided StringBuilder. This way new Strings/StringBuilders don't need to be built up as characters are encoded. Although this would be needed on all the codecs, I did enclose the suggested code change for the base Codec class and the HTMLEntityCodec class.
All these extra objects being created, among other things, will be more work for the garbage collector. I wouldn't care too much if we only made occasional calls to it, but in the course of building up an HTML page, we could be making thousands of calls in to this method. Now look at that page being rendered under load and we're quickly in to the millions of invocations.
Thanks for your consideration,
Eric
Attachment: Codec.java HTMLEntityCodec.java
Original issue: http://code.google.com/p/owasp-esapi-java/issues/detail?id=199
From [email protected] on January 26, 2011 16:27:11
Actually, the code I attached is still doing more work than needed. If the StringBuilder is passed in to the encodeCharacter method, then String.valueOf() is unnecessary and overkill. The Character should be passed directly in to the StringBuilder. For example:
public StringBuilder encodeCharacterAndAppend(StringBuilder sb, char[] immune, Character c ) {
// check for immune characters
if ( containsCharacter(c, immune ) ) {
return sb.append(c);
}
// check for alphanumeric characters
String hex = Codec.getHexForNonAlphanumeric(c);
if ( hex == null ) {
return sb.append(c);
}
I've uploaded the new HTMLEntityCode with this change.
Attachment: HTMLEntityCodec.java