Support HTML5 entities
Hi! Would it be possible to add support for HTML5 entities? .NET team dropped the PR since they are not backwards compatible and there was little interest from people so they decided not to update it yet.
Few examples I have run into today are ! ( ) $comma; ...
Hello @leoshusar ,
Just to make sure, what is exactly the behavior you are looking for? Could you show us an example?
I know there is already some stuff that we support in this part.
See:
- https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlEntity.cs#L54
- https://github.com/zzzprojects/html-agility-pack/blob/c41452a1ebd2f7549767b4924596cccc3eca8ded/src/HtmlAgilityPack.Shared/HtmlAttribute.cs#L229
But we indeed maybe not support what you are looking for but this is the part I'm not sure about your request.
Best Regards,
Jon
Sponsorship Help us improve this library
Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework Extensions • Bulk Operations • Dapper Plus
Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval Function • SQL Eval Function
Hi, @JonathanMagnan,
for example this string: {[()]},!@"€#&~ˇ^˘°=;
when you use e.g. this website for encoding, you will get this fully encoded string:
{[()]},!@"€#&~ˇ^˘°=;
and these are outputs when you try do decode it in C#:
HttpUtility.HtmlDecode: {[()]},!@"?#&~ˇ^˘°=;
HtmlEntity.DeEntitize: {[()]},!@"?#&~ˇ^˘°=;
because neither of these decoders have HTML5 support. Here is the W3 spec with all the HTML5 characters, there is 2231 of them :) But there are some differences between HTML4 and 5 (noted here), for example:
The ⟨ and ⟩ named character references now expand to U+27E8 and U+27E9 (mathematical left/right angle bracket) instead of U+2329 and U+232A (left/right-pointing angle bracket), respectively.
so the DeEntitizer cannot just be updated with new characters. And that's also the reason why the PR was not merged in dotnet.