html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

[BUG] HTMLAGILITY pack await web.LoadFromWebAsync not working on Windows Servers 2016 and 2019

Open broomop opened this issue 4 years ago • 8 comments

1. Description

I am getting no response and it acts as if nothing is happening. Tried using https://github.com/zzzprojects/html-agility-pack/issues/171 async in this thread and it just sits there blinking and does nothing.

** var web = new HtmlWeb(); web.UsingCache = false; web.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"; var doc = await web.LoadFromWebAsync(page);** tried without useragent but no different.

2. Exception

No exceptions and i cannot debug as i dont have the right equipment on the servers to do that. works fine on windows 10.

Exception message:
no exceptions.

3. Fiddle or Project

unable to but i see the traffic spike it looks like i even receive the data but nothing is shown.

4. Any further technical details

Add any relevant detail can help us, such as:

  • HAP version:1.11.34.0
  • NET version (4.6.1)

broomop avatar Jun 20 '21 11:06 broomop

Hello @broomop ,

Everything worked when I tried it.

I recommend you to try with ConfigureAwait(false):

await web.LoadFromWebAsync(html).ConfigureAwait(false)

Depending on the type of application, it might be required to avoid some thread deadlock.

Best Regards,

Jon


Sponsorship Help us improve this library

Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework ExtensionsBulk OperationsDapper Plus

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval FunctionSQL Eval Function

JonathanMagnan avatar Jun 21 '21 12:06 JonathanMagnan

you tried this on windows server 2016 or 2019?? i had no issues on windows 10 just the server editions.

broomop avatar Jun 21 '21 16:06 broomop

Yes,

The test was on windows server 2016

It might also be caused by some security policy on your side.

The library is using an HttpClient: https://github.com/zzzprojects/html-agility-pack/blob/08694be2d81e552ec87e19082396f5d57d8832c2/src/HtmlAgilityPack.Shared/HtmlWeb.cs#L2364

So perhaps you could try to grab the text on your side and simply make HAP parsing it after. Unfortunately, I don't see really anything that we could change that could help you ;(

JonathanMagnan avatar Jun 21 '21 18:06 JonathanMagnan

hi it seems that httpclient isn't liked very much anymore. I am trying to see if i can unlock the http client supposedly its to do with asp.net and using web.config and allowing any logins etc... if you can help any further on this that would be great otherwise thanks for your help.

broomop avatar Jun 22 '21 02:06 broomop

After reading some more someone mentioned the httpclient is not threadsafe the way it is. and should have a httpresponsemessage used as well:

https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=netframework-4.7.2

broomop avatar Jun 22 '21 03:06 broomop

I have figured that you have to code it exactly like this and wait each time on a windows server:

         try
		{
			GetHtmlDocumentAsync().GetAwaiter().GetResult();
		}
		catch (Exception ex)
		{
			Console.WriteLine(ex.Message);  
		}
		try
		{
			HtmlDocument test = GetHtmlDocument();
			Console.WriteLine(test.Text);
		}
		catch (Exception ex)
		{
			Console.WriteLine(ex.Message);
		}
		Console.ReadLine();
	}

	async public static Task<HtmlDocument> GetHtmlDocumentAsync()
	{
		HtmlWeb web = new HtmlWeb();
		return await web.LoadFromWebAsync(url);
	}

	public static HtmlDocument GetHtmlDocument()
	{
		HtmlWeb web = new HtmlWeb();
		return web.Load(url);
	}`

instead of just doing loads of await web.LoadFromWebAsync(url); with no handling.

broomop avatar Jun 22 '21 03:06 broomop

could i also ask how would you do some sort of threaded method so that my application does not lock up?

broomop avatar Jun 22 '21 17:06 broomop

Hello @broomop ,

Thank you for the information about the HttpClient not being thread-safe.

I really recommend you to grab the HTML on your side in this case and just use the LoadHtml method from the HtmlDocument to parse it.

The way HAP has been built doesn't currently work with a static HttpClient. So there is some issue that we need to speak about here first to determine how we want to solve this.

However, you can already solve it on your side by using all the best practices you already find out and take the HTML.

JonathanMagnan avatar Jun 23 '21 14:06 JonathanMagnan