html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

htmlDoc.DocumentNode.SelectSingleNode("someXpathValue") returns random null objects

Open Futuresmo opened this issue 7 years ago • 6 comments

initially this function worked just fine returning objects values as expected. Recently it started returning null values. I do not expect neither empty collection or null for this function as html document loaded without issues. Xpath value seem to be fine as well, as occasionally function returns values as expected.

Futuresmo avatar May 28 '18 06:05 Futuresmo

Hello @Futuresmo ,

Can you give me an example? Our code doesn't have obviously some random behavior, it always work or not so we suspect the server give different value sometimes (usually caused by bot detection).

Best Regards,

Jonathan


Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework ExtensionsBulk OperationsDapper PlusLinqToSql Plus

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval FunctionSQL Eval Function

JonathanMagnan avatar May 28 '18 12:05 JonathanMagnan

htmlDoc.DocumentNode.SelectSingleNode("//div[h2]"); initially returned "div" type of object. Since couple of weeks i am getting NULL values instead.

I am having max 5-10 calls during the days, not sure why this would be considered as bot. Is there any workaround there?

Futuresmo avatar May 29 '18 10:05 Futuresmo

Hello @Futuresmo ,

Do you have the link as well?

If that always happen, perhaps they simply modified the HTML. If that happens from time to time, there is not so much we can do as the library probably always work, simply the HTML is not the same.

Best Regards,

Jonathan

JonathanMagnan avatar May 29 '18 12:05 JonathanMagnan

https://forexlive.com/orders/!/fx-option-expiries-for-the-1400-gmt-cut-28-march-2018-20180328

just tested successfully, followed by null value in a minute

Futuresmo avatar May 29 '18 12:05 Futuresmo

Hello @Futuresmo ,

If you look at the current HTML you will find out that's almost empty since they detected it was not really coming from a browser but from a script/robot.

I believe maybe playing with the UserAgent could help you but I'm not aware of an UserAgent that work with this site.

web.UserAgent = "Mozilla/5.0";

Unfortunately, I don't believe we will be able to help you further in this issue.

Best Regards,

Jonathan

JonathanMagnan avatar May 29 '18 19:05 JonathanMagnan

Not sure if this is relevant but comparing htmlDocument returned from the http above in happy scenario has much more child nodes. In Null type scenario the nodes referenced in xpath are missing, that's why node object is returned as null. Changing the user agent as described above does not seem to make any difference.

Futuresmo avatar May 31 '18 14:05 Futuresmo