How to add exception to not escape & ?
I'm trying to use this as asp.net Request filter module for all POST/PUT/DELETE requests, and I don't want to encode "&" as it will break the asp.net web forms application. And I don't think & can cause any harm in HTML.
What exactly is breaking the web application? Perhaps you can try and replace the formatter.
Asp.net web forms uses & ; and " for it's internal post back and encoding those creates problems.
@mganss how can I replace that formatter so that & does not get encoded ?
Create your own implementation of IMarkupFormatter and assign an object of this class to the OutputFormatter property. You might want to derive from an existing formatter class, like here: https://github.com/mganss/HtmlSanitizer/blob/61008c6d0e492e641510726da881ee0c9577c305/src/HtmlSanitizer/HtmlFormatter.cs
How does Web Forms use & etc in a way that causes problems?
@mganss e.g. Asp.net web form is posting data as below, where we cannot encode &
FirstName=Jay&LastName=SHAH&Details=<div onload="alert('xss')"style="background-color: test"> Hello World !&;"'=$? <img src="test.gif"style="background-image: url(javascript:alert('xss')); margin: 10px">
Please check below image - I get & encoded, but that's not the expected output.

@mganss In Custom OutputFormatter, which method/property should I override to stop the encoding of & ? Overriding Attribute method is not helping here.
I'm confused. What exactly is the input for the sanitizer? Where does the input come from? Please post some demo code.
@mganss Below is the Input (3 form fields Firstname, LastName and Details - posted using form POST): FirstName=Jay&LastName=SHAH&Details=<div onload="alert('xss')"style="background-color: test"> Hello World !&;"'=$? <img src="test.gif"style="background-image: url(javascript:alert('xss')); margin: 10px">
I think you're applying the sanitization at the wrong level. You should probably get only the value of the Details form variable and sanitize that:
var details = Request.Form["Details"];
details = sanitizer.Sanitize(details);
@mganss Yes, but unfortunately, that's not an option as this is a huge legacy application and there're 100s of pages that have lots of fields which can have unsafe HTML. So is it possible to sanitize entire request body ? Encoding of & seems the biggest blocker in doing that.
@mganss It would be a great help if you can suggest any other way in your library to sanitize entire Request body, so we can just plug it in Asp.Net global Request filter without worrying about page/method/field. It's fine if we need to add different cases for different content types like application/x-www-form-urlencoded, multipart/form-data, application/json, text/plain, etc.
Where are you inserting the filter? Is it this: https://docs.microsoft.com/en-us/dotnet/api/system.web.httprequest.filter ?
@mganss Yes, by overriding Read method and applying sanitizer there.
How about modifying the Forms collection in Global.asax.cs?
void Application_BeginRequest(object sender, EventArgs e)
{
var form = Request.Form;
var isReadOnly = form.GetType().GetProperty("IsReadOnly", BindingFlags.NonPublic | BindingFlags.Instance);
isReadOnly.SetValue(form, false, null);
form["Name"] = sanitizer.Sanitize(form["Name"]);
isReadOnly.SetValue(form, true, null);
}