Newtonsoft.Json.Schema
Newtonsoft.Json.Schema copied to clipboard
URI validation struggling with encoded en-dash
Hi,
I'm having a bit of trouble relating to the validation of urls containing certain encoded characters that is causing a bit of a headache.
using Newtonsoft.Json.Linq;
using Newtonsoft.Json.Schema;
namespace JsonSchemaTest
{
class Program
{
static void Main(string[] args)
{
var schema = JSchema.Parse("{ \"type\": \"object\", \"properties\": { \"url\": { \"type\": \"string\", \"format\": \"uri\" } } }");
var json = JObject.Parse("{ \"url\": \"https://example.com/foo%20bar%E2%80%93baz\" }");
try
{
json.Validate(schema);
Console.WriteLine("Passed");
}
catch (Exception e)
{
Console.WriteLine($"Failed: {e.Message}");
}
}
}
}
In the above, the url fails validation using 3.0.14.
However, if I remove either the %20
(encoded space) or the %E2%80%93
(encoded en-dash) from the string, then validation passes. I cannot work out why these two sets of encoded parameters occuring together would cause the url to be deemed as invalid, though I suspect this is a bug as other JSON schema validators do not seem to have an issue with this.
Complicating things further, I cannot reproduce the error using your online validator, which I assume is running using the same code....
Any assistance you can lend here would be greatly appreciated!
Yeah, validation of URIs is a bit borked.
Newtonsoft.Json.Schema uses the System.Uri.IsWellFormedUriString
method for validating URIs:
https://github.com/JamesNK/Newtonsoft.Json.Schema/blob/5a68513e38d899d49f0842ba82a2235154ce3c18/Src/Newtonsoft.Json.Schema/Infrastructure/Validation/PrimativeScope.cs#L296-L299
Unfortunately, Uri.IsWellFormedUriString
is a bit ...uh... temperamental, with its behavior changing depending on the runtime environment. No wonder that you couldn't verify the behavior in the online behavior.
For example, look at this little dotnetfiddle: https://dotnetfiddle.net/Jr0OyW
It runs under .NET 4.7.2. Note how the Uri.IsWellFormedUriString method returns true
.
Now, the same little dotnetfiddle example, but this time running under .NET 5.0: https://dotnetfiddle.net/tTLfut
Uhh, the exact same code, but now Uri.IsWellFormedUriString returns false
.
The issue is already known (see here: https://github.com/dotnet/runtime/issues/34031) to the .NET team, but unfortunately no milestone has been set for fixing it. That means, to fix the issue here in Newtonsoft.Json.Schema, calls to Uri.IsWellFormedUriString would have to be replaced by a different, more reliable Uri validation routine. Or keep waiting for the day when the issue is going to be fixed in whatever future .NET version... :-(
Hi @elgonzo - thank you so much for that detail. I had no idea URI validation was so fragile in .NET. I sense I'm about to go into a bit of a wormhole looking into it further 😅.
I'm going to leave this ticket open in case someone has any good ideas for ways this could be fixed in JSON.NET (eg other validation methods that could be swapped in, perhaps behind a configuration option to maintain backwards compatibility, as you suggest) - or at least until an admin closes it instead.