html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

Parsing error

Open RahulMathew opened this issue 5 years ago • 6 comments

Description

If an element's attribute value comes like this. e.g. <img src='/image.jpg' alt='Savannah's streets'>

'Savannah's streets' quotes insides quotes will break the html.

output will be <img src='/image.jpg' alt='Savannah' s="" streets'="" >

Exception - Malformed HTML

alt='Savannah' s="" streets'=""

Further technical details

  • HAP version: Even for 1.11.7
  • NET version (4.6):

RahulMathew avatar Jun 20 '19 20:06 RahulMathew

Hello @RahulMathew ,

I'm not sure what's the problem here, we got the same behavior when using an HTML inspector directly in a browser.

<img src="/image.jpg" alt="Savannah" s="" streets'="">

The HTML is malformed, so there is not so much we can do and we do pretty much the same thing as a browser do here unless I'm missing something.

Best Regards,

Jonathan


Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework ExtensionsEntity Framework ClassicBulk OperationsDapper Plus

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval FunctionSQL Eval Function

JonathanMagnan avatar Jun 21 '19 00:06 JonathanMagnan

Hi,

The input html is correct but after we load the html using the Load method and if we get back the html if you have an attribute like this

e.g.

case 1

string html = "Savannah streets'

correct output

Savannah streets -- works fine since the attribute does not come with apostrophe within single quotes

case 2

string html = "Savannah's streets' malformed output

<img src="/image.jpg" alt="Savannah" s="" streets'=""> -- html output from the HtmlAgilityPack will break as it doesnot parse

apostrophe within single quotes and you will get the above output.

'Savannah's streets' -- the apostrophe 's will break the html. It is easy to reproduce this.

Thanks

Rahul Mathew

On Thu, Jun 20, 2019 at 8:54 PM Jonathan Magnan [email protected] wrote:

Hello @RahulMathew https://github.com/RahulMathew ,

I'm not sure what's the problem here, we got the same behavior when using an HTML inspector directly in a browser.

<img src="/image.jpg" alt="Savannah" s="" streets'="">

The HTML is malformed, so there is not so much we can do and we do pretty much the same thing as a browser do here unless I'm missing something.

Best Regards,

Jonathan

Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework Extensions http://entityframework-extensions.net/ • Entity Framework Classic http://entityframework-classic.net/ • Bulk Operations http://bulk-operations.net/ • Dapper Plus http://dapper-plus.net/

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval Function http://eval-expression.net/ • SQL Eval Function http://eval-sql.net/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/307?email_source=notifications&email_token=AMND3TH45DE3FICIFZJN6Z3P3QRD5A5CNFSM4HZ5GGMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYHDYHA#issuecomment-504249372, or mute the thread https://github.com/notifications/unsubscribe-auth/AMND3TC727NEIVW24XIXRADP3QRD5ANCNFSM4HZ5GGMA .

RahulMathew avatar Jun 21 '19 01:06 RahulMathew

Hi,

The input html is correct but after we load the html using the Load method and if we get back the html if you have an attribute like this

e.g.

case 1

Savannah streets

correct output

Savannah streets -- works fine since the attribute does not come with apostrophe within single quotes

case 2

<img src="/image.jpg" alt='Savannah's streets' />

malformed output <img src="/image.jpg" alt="Savannah" s="" streets'=""> -- html output from the HtmlAgilityPack will break as it doesnot parse apostrophe within single quotes and you will get the above output. 'Savannah's streets' -- the apostrophe 's will break the html. It is easy to reproduce this.

Thanks

On Thu, Jun 20, 2019 at 9:29 PM Rahul Mathew [email protected] wrote:

Hi,

The input html is correct but after we load the html using the Load method and if we get back the html if you have an attribute like this

e.g.

case 1

string html = "Savannah streets'

correct output

Savannah streets -- works fine since the attribute does not come with apostrophe within single quotes

case 2

string html = "Savannah's streets' malformed output

<img src="/image.jpg" alt="Savannah" s="" streets'=""> -- html output from the HtmlAgilityPack will break as it doesnot parse

apostrophe within single quotes and you will get the above output.

'Savannah's streets' -- the apostrophe 's will break the html. It is easy to reproduce this.

Thanks

Rahul Mathew

On Thu, Jun 20, 2019 at 8:54 PM Jonathan Magnan [email protected] wrote:

Hello @RahulMathew https://github.com/RahulMathew ,

I'm not sure what's the problem here, we got the same behavior when using an HTML inspector directly in a browser.

<img src="/image.jpg" alt="Savannah" s="" streets'="">

The HTML is malformed, so there is not so much we can do and we do pretty much the same thing as a browser do here unless I'm missing something.

Best Regards,

Jonathan

Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework Extensions http://entityframework-extensions.net/ • Entity Framework Classic http://entityframework-classic.net/ • Bulk Operations http://bulk-operations.net/ • Dapper Plus http://dapper-plus.net/

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval Function http://eval-expression.net/ • SQL Eval Function http://eval-sql.net/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/307?email_source=notifications&email_token=AMND3TH45DE3FICIFZJN6Z3P3QRD5A5CNFSM4HZ5GGMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYHDYHA#issuecomment-504249372, or mute the thread https://github.com/notifications/unsubscribe-auth/AMND3TC727NEIVW24XIXRADP3QRD5ANCNFSM4HZ5GGMA .

RahulMathew avatar Jun 21 '19 01:06 RahulMathew

Hi,

I am attaching the bug screenshot .

Thanks Rahul Mathew

On Thu, Jun 20, 2019 at 9:31 PM Rahul Mathew [email protected] wrote:

Hi,

The input html is correct but after we load the html using the Load method and if we get back the html if you have an attribute like this

e.g.

case 1

Savannah streets

correct output

Savannah streets -- works fine since the attribute does not come with apostrophe within single quotes

case 2

<img src="/image.jpg" alt='Savannah's streets' />

malformed output <img src="/image.jpg" alt="Savannah" s="" streets'=""> -- html output from the HtmlAgilityPack will break as it doesnot parse apostrophe within single quotes and you will get the above output. 'Savannah's streets' -- the apostrophe 's will break the html. It is easy to reproduce this.

Thanks

On Thu, Jun 20, 2019 at 9:29 PM Rahul Mathew [email protected] wrote:

Hi,

The input html is correct but after we load the html using the Load method and if we get back the html if you have an attribute like this

e.g.

case 1

string html = "Savannah streets'

correct output

Savannah streets -- works fine since the attribute does not come with apostrophe within single quotes

case 2

string html = "Savannah's streets' malformed output

<img src="/image.jpg" alt="Savannah" s="" streets'=""> -- html output from the HtmlAgilityPack will break as it doesnot parse

apostrophe within single quotes and you will get the above output.

'Savannah's streets' -- the apostrophe 's will break the html. It is easy to reproduce this.

Thanks

Rahul Mathew

On Thu, Jun 20, 2019 at 8:54 PM Jonathan Magnan [email protected] wrote:

Hello @RahulMathew https://github.com/RahulMathew ,

I'm not sure what's the problem here, we got the same behavior when using an HTML inspector directly in a browser.

<img src="/image.jpg" alt="Savannah" s="" streets'="">

The HTML is malformed, so there is not so much we can do and we do pretty much the same thing as a browser do here unless I'm missing something.

Best Regards,

Jonathan

Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework Extensions http://entityframework-extensions.net/ • Entity Framework Classic http://entityframework-classic.net/ • Bulk Operations http://bulk-operations.net/ • Dapper Plus http://dapper-plus.net/

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval Function http://eval-expression.net/ • SQL Eval Function http://eval-sql.net/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/307?email_source=notifications&email_token=AMND3TH45DE3FICIFZJN6Z3P3QRD5A5CNFSM4HZ5GGMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYHDYHA#issuecomment-504249372, or mute the thread https://github.com/notifications/unsubscribe-auth/AMND3TC727NEIVW24XIXRADP3QRD5ANCNFSM4HZ5GGMA .

RahulMathew avatar Jun 21 '19 01:06 RahulMathew

I believe if you missed to attach your screenshot correctly ;)

JonathanMagnan avatar Jun 21 '19 10:06 JonathanMagnan

Hi,

Please check the email inbox it has the screen shot in case you couldnt find it in the github.

Thanks Rahul Mathew

On Fri, Jun 21, 2019 at 6:16 AM Jonathan Magnan [email protected] wrote:

I believe if you missed to attach your screenshot correctly ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/307?email_source=notifications&email_token=AMND3TA55DUL3KUV2XCWCILP3SS6NA5CNFSM4HZ5GGMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYICEZQ#issuecomment-504373862, or mute the thread https://github.com/notifications/unsubscribe-auth/AMND3TCF4V5RQSZOJGNYSUDP3SS6NANCNFSM4HZ5GGMA .

RahulMathew avatar Jun 21 '19 11:06 RahulMathew