html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

HtmlNode.InnerText Not working properly

Open ghost opened this issue 3 years ago • 2 comments

InnerText returns " "

1. Description

<p id="demo">hello &nbsp; </p>

var text =htmlNode.InnerText;

//Get:  

"hello &nbsp; "

//Expected:
"hello   "

4. Any further technical details

Add any relevant detail can help us, such as:

  • HAP version: 1.8.1
  • NET version: .NET 4.0

ghost avatar Mar 02 '21 14:03 ghost

Hello @MartinHenkeQP ,

Due to some backward compatibility, we choose to let this work like this. Many people use the library to parse text and would have expected to get the &nbsp; and not a space (even if space is really what should have been expected).

From the past experience, we learned that this small kind of fix generally causes a lot of issues for people using our library in their production environment, so we try as much as possible to don't touch it.

You can on your side fix it by simply decoding the HTML:

HtmlDocument htmlDocument = new HtmlDocument(); 
htmlDocument.LoadHtml(@"<p id=""demo""> hello &nbsp; </p>"); 

var text = HttpUtility.HtmlDecode( htmlDocument.DocumentNode.InnerText);

Best Regards,

Jon


Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework ExtensionsEntity Framework ClassicBulk OperationsDapper Plus

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval FunctionSQL Eval Function

JonathanMagnan avatar Mar 08 '21 17:03 JonathanMagnan

Dear Jon,

Thanks for the info. However to simplify the process and to avoid this issue being reported again, perhaps it might be an option to provide another method, e.g. GetInnerText(), and set a comment to the InnerText property to use GetInnerText() to get the decoded text. This would not break compatibility and support developers who need the decoded text.

Best regards,

Martin


Von: Jonathan Magnan [email protected] Gesendet: Montag, 8. März 2021 18:03 An: zzzprojects/html-agility-pack Cc: Martin Henke; Mention Betreff: Re: [zzzprojects/html-agility-pack] HtmlNode.InnerText Not working properly (#427)

Hello @MartinHenkeQPhttps://github.com/MartinHenkeQP ,

Due to some backward compatibility, we choose to let this work like this. Many people use the library to parse text and would have expected to get the   and not a space (even if space is really what should have been expected).

From the past experience, we learned that this small kind of fix generally causes a lot of issues for people using our library in their production environment, so we try as much as possible to don't touch it.

You can on your side fix it by simply decoding the HTML:

HtmlDocument htmlDocument = new HtmlDocument();

htmlDocument.LoadHtml(@"<p id=""demo""> hello  

");

var text = HttpUtility.HtmlDecode( htmlDocument.DocumentNode.InnerText);

Best Regards,

Jon


Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework Extensionshttp://entityframework-extensions.net/ * Entity Framework Classichttp://entityframework-classic.net/ * Bulk Operationshttp://bulk-operations.net/ * Dapper Plushttp://dapper-plus.net/

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval Functionhttp://eval-expression.net/ * SQL Eval Functionhttp://eval-sql.net/

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/zzzprojects/html-agility-pack/issues/427#issuecomment-792904026, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJG6P53C7GCYENIHL4BBQMTTCT7NJANCNFSM4YPECLXQ.

ghost avatar Mar 09 '21 08:03 ghost