jquery-encoder
jquery-encoder copied to clipboard
canonicalize modifies an unencoded string
In other words, it thinks that a string is encoded when it is actually not and therefore if I do something like
$.encoder.encodeForHTML($.encoder.canonicalize(string)), it gives me a different string
The string in question is something like this: "sdf\sdf\sdf"
Canonicalize transforms it into this: sdf�sdf�sdf
Not sure there is a way around this problem, \s will be considered a control character and will be decoded by canonicalization. Even if you were to do \s and escape the \ it would still be normalized and decoded on the subsequent pass. Is there any other character that could be used in the place of the backslash which is the control character marker for most programming languages? Changing the encoder would allow an attacker to pass control characters using multiple encoding attacks which is less than ideal.
I think a way around this is to provide another API for canonicalize for code. The use case I have is a regex is typed into an input field. So people can choose which canonicalize function to use for values where code is expected. The same encoder is fine.
To continue on my last comment:
Lets say I have a wrapper function encodeForCode
it will have the following:
encodeForCode { $.encoder.encodeForHTML($.encoder.canonicalizeForCode(string)); }
By it's nature canonicalization is intended to reduce a string to it's simplest form, that is to replace any escaped characters with their character representations so there is only 1 canonicalize function. Not sure I see a use for more than that. I can however see a use-case for allowing customization of the codecs that are used for canonicalization.
So basically you would be able to customise the behavior of canonicalization and what it interprets as a control character.
Like this
function encodeForCode(strInput) {
$.encoder.encodeForHTML($.encoder.canonicalize({input: strInput, codecs: [ new HTMLEntityCodec(), new PercentCodec() ]});
}
This would eliminate the
Hey @stuartf - trying to close the loop on some of these older issues. Does the suggested fix accommodate your requirements?
@nicolaasmatthijs @simong did we work around this somehow, or is it still a problem for oae?
I just ran into the exact same problem today when a legitimate user input string contained backslashes (an attempt to share a windows file path, eg "c:\ext").
Going to look into the above suggestion by @chrisisbeef and will post the outcome.
Avoiding the CSSCodec in the canonicalize function worked for me.
Note to anyone else experiencing this problem: The example code above by @chrisisbeef is an incomplete hypothetical customization. The current canonicalize function has the codecs var hard-coded to use all 3 codecs. If you want to pass in different codecs as in the example above, the canonicalize function also needs to be modified.