jquery-encoder icon indicating copy to clipboard operation
jquery-encoder copied to clipboard

canonicalize modifies an unencoded string

Open krinsane opened this issue 13 years ago • 8 comments

In other words, it thinks that a string is encoded when it is actually not and therefore if I do something like

$.encoder.encodeForHTML($.encoder.canonicalize(string)), it gives me a different string

The string in question is something like this: "sdf\sdf\sdf"

Canonicalize transforms it into this: sdf�sdf�sdf

krinsane avatar Mar 01 '12 22:03 krinsane

Not sure there is a way around this problem, \s will be considered a control character and will be decoded by canonicalization. Even if you were to do \s and escape the \ it would still be normalized and decoded on the subsequent pass. Is there any other character that could be used in the place of the backslash which is the control character marker for most programming languages? Changing the encoder would allow an attacker to pass control characters using multiple encoding attacks which is less than ideal.

chrisisbeef avatar Mar 02 '12 14:03 chrisisbeef

I think a way around this is to provide another API for canonicalize for code. The use case I have is a regex is typed into an input field. So people can choose which canonicalize function to use for values where code is expected. The same encoder is fine.

krinsane avatar Mar 03 '12 00:03 krinsane

To continue on my last comment:

Lets say I have a wrapper function encodeForCode

it will have the following:

encodeForCode { $.encoder.encodeForHTML($.encoder.canonicalizeForCode(string)); }

krinsane avatar Mar 03 '12 00:03 krinsane

By it's nature canonicalization is intended to reduce a string to it's simplest form, that is to replace any escaped characters with their character representations so there is only 1 canonicalize function. Not sure I see a use for more than that. I can however see a use-case for allowing customization of the codecs that are used for canonicalization.

So basically you would be able to customise the behavior of canonicalization and what it interprets as a control character.

Like this

function encodeForCode(strInput) {
   $.encoder.encodeForHTML($.encoder.canonicalize({input: strInput, codecs: [ new HTMLEntityCodec(), new PercentCodec() ]});
}

This would eliminate the from being interpreted as a control character and canonicalized as this is a CSS escaping syntax

chrisisbeef avatar Mar 21 '12 01:03 chrisisbeef

Hey @stuartf - trying to close the loop on some of these older issues. Does the suggested fix accommodate your requirements?

chrisisbeef avatar Dec 11 '15 20:12 chrisisbeef

@nicolaasmatthijs @simong did we work around this somehow, or is it still a problem for oae?

stuartf avatar Dec 14 '15 21:12 stuartf

I just ran into the exact same problem today when a legitimate user input string contained backslashes (an attempt to share a windows file path, eg "c:\ext").

Going to look into the above suggestion by @chrisisbeef and will post the outcome.

desertdev avatar May 05 '16 18:05 desertdev

Avoiding the CSSCodec in the canonicalize function worked for me.

Note to anyone else experiencing this problem: The example code above by @chrisisbeef is an incomplete hypothetical customization. The current canonicalize function has the codecs var hard-coded to use all 3 codecs. If you want to pass in different codecs as in the example above, the canonicalize function also needs to be modified.

desertdev avatar May 05 '16 19:05 desertdev