Newtonsoft.Json.Schema icon indicating copy to clipboard operation
Newtonsoft.Json.Schema copied to clipboard

URI validation struggling with encoded en-dash

Open alexdawes opened this issue 3 years ago • 2 comments

Hi,

I'm having a bit of trouble relating to the validation of urls containing certain encoded characters that is causing a bit of a headache.

using Newtonsoft.Json.Linq;
using Newtonsoft.Json.Schema;

namespace JsonSchemaTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var schema = JSchema.Parse("{ \"type\": \"object\", \"properties\": { \"url\": { \"type\": \"string\", \"format\": \"uri\" } } }");
            var json = JObject.Parse("{ \"url\": \"https://example.com/foo%20bar%E2%80%93baz\" }");

            try
            {
                json.Validate(schema);
                Console.WriteLine("Passed");
            }
            catch (Exception e)
            {
                Console.WriteLine($"Failed: {e.Message}");
            }
        }
    }
}

In the above, the url fails validation using 3.0.14.

However, if I remove either the %20 (encoded space) or the %E2%80%93 (encoded en-dash) from the string, then validation passes. I cannot work out why these two sets of encoded parameters occuring together would cause the url to be deemed as invalid, though I suspect this is a bug as other JSON schema validators do not seem to have an issue with this.

Complicating things further, I cannot reproduce the error using your online validator, which I assume is running using the same code....

image

Any assistance you can lend here would be greatly appreciated!

alexdawes avatar May 25 '21 17:05 alexdawes

Yeah, validation of URIs is a bit borked.

Newtonsoft.Json.Schema uses the System.Uri.IsWellFormedUriString method for validating URIs: https://github.com/JamesNK/Newtonsoft.Json.Schema/blob/5a68513e38d899d49f0842ba82a2235154ce3c18/Src/Newtonsoft.Json.Schema/Infrastructure/Validation/PrimativeScope.cs#L296-L299

Unfortunately, Uri.IsWellFormedUriString is a bit ...uh... temperamental, with its behavior changing depending on the runtime environment. No wonder that you couldn't verify the behavior in the online behavior.

For example, look at this little dotnetfiddle: https://dotnetfiddle.net/Jr0OyW It runs under .NET 4.7.2. Note how the Uri.IsWellFormedUriString method returns true.

Now, the same little dotnetfiddle example, but this time running under .NET 5.0: https://dotnetfiddle.net/tTLfut Uhh, the exact same code, but now Uri.IsWellFormedUriString returns false.

The issue is already known (see here: https://github.com/dotnet/runtime/issues/34031) to the .NET team, but unfortunately no milestone has been set for fixing it. That means, to fix the issue here in Newtonsoft.Json.Schema, calls to Uri.IsWellFormedUriString would have to be replaced by a different, more reliable Uri validation routine. Or keep waiting for the day when the issue is going to be fixed in whatever future .NET version... :-(

elgonzo avatar Jul 03 '21 13:07 elgonzo

Hi @elgonzo - thank you so much for that detail. I had no idea URI validation was so fragile in .NET. I sense I'm about to go into a bit of a wormhole looking into it further 😅.

I'm going to leave this ticket open in case someone has any good ideas for ways this could be fixed in JSON.NET (eg other validation methods that could be swapped in, perhaps behind a configuration option to maintain backwards compatibility, as you suggest) - or at least until an admin closes it instead.

alexdawes avatar Jul 03 '21 19:07 alexdawes