MySqlConnector icon indicating copy to clipboard operation
MySqlConnector copied to clipboard

MySqlException continuously thrown after restart mysql server

Open ewfian opened this issue 2 years ago • 12 comments

Software versions MySqlConnector version: 2.1.11 Server type (MySQL, MariaDB, Aurora, etc.) and version: MySQL 5.7 .NET version: net6.0 (Optional) ORM NuGet packages and versions: Pomelo.EntityFrameworkCore.MySql 6.0.2

Describe the bug MySqlException continuously thrown after restart mysql server.

Step to reproduce

  1. Run mysql:5.7 image in docker
  2. Run following demo code
  3. Restart mysql container after the demo is fully started
  4. The execptions will continuously thrown

Exception

MySqlConnector.MySqlException (0x80004005): Connect Timeout expired. All pooled connections are in use.
   at MySqlConnector.MySqlConnection.CreateSessionAsync(ConnectionPool pool, Int32 startTickCount, Nullable`1 ioBehavior, CancellationToken cancellationToken) in D:\Repos\MySqlConnector\src\MySqlConnector\MySqlConnection.cs:line 937
   at MySqlConnector.MySqlConnection.OpenAsync(Nullable`1 ioBehavior, CancellationToken cancellationToken) in D:\Repos\MySqlConnector\src\MySqlConnector\MySqlConnection.cs:line 410
   at ConsoleApp1.Program.c.b__2_0>d.MoveNext() in D:\Repos\MySqlConnector\ConsoleApp1\Program.cs:line 34

Code sample

using MySqlConnector;
using MySqlConnector.Logging;

namespace ConsoleApp1
{
	class Logger : IMySqlConnectorLogger
	{
		public bool IsEnabled(MySqlConnectorLogLevel level) => true;
		public void Log(MySqlConnectorLogLevel level, string message, object?[]? args = null, Exception? exception = null) => Console.WriteLine("[{0}] {1}", level, message);
	}
	class LoggerProvider : IMySqlConnectorLoggerProvider
	{
		public IMySqlConnectorLogger CreateLogger(string name) => new Logger();
	}
	internal class Program
	{
		const string connString = "server=localhost;port=3306;user=kr;password=*;database=*;Character Set=utf8mb4;";
		const string sql = @"SELECT id FROM user WHERE id = 1";
		static async Task Main(string[] args)
		{
			MySqlConnectorLogManager.Provider = new LoggerProvider();
			Console.WriteLine("Hello, World!");

			for (var i = 0; i < 10000; i++)
			{
				for (var j = 0; j < 1000; j++)
				{
					_ = Task.Run(async () =>
					{
						try
						{
							using (var conn = new MySqlConnection(connString))
							{
								await conn.OpenAsync();

								using (var cmd = new MySqlCommand(sql, conn))
								using (var reader = await cmd.ExecuteReaderAsync())
									while (await reader.ReadAsync())
										Console.WriteLine(reader.GetInt32(0));
							}
						}
						catch (Exception ex)
						{
							Console.WriteLine(ex);
						}
					});
				}
				await Task.Delay(1000);
			}

			Console.ReadLine();
		}
	}
}

Expected behavior It should able to back to normal after restart finished.

Additional context Our program is hosted on Azure and working behind a Load Balancer while the mysql server is hosted on AWS. The Load Balancer or other various network devices on the path from Azure to AWS has some potentially instability. Such as NAT aging. Restart server is just a way to simulate NAT aging. Node switching may also occur in AWS RDS. We expects there should have some resilience to recover from a bad network condition that has just been endured or node switching instead of requires manual intervention.

The issue is actually happed by using a linq with Pomelo.EntityFrameworkCore.MySql. When the pressure is high, restarting mysql repeatedly can reproduce this issue. The above code is just a simple demo, maybe there are other reasons.

Releated to: https://github.com/mysql-net/MySqlConnector/issues/1071 https://github.com/mysql-net/MySqlConnector/issues/1033 https://github.com/mysql-net/MySqlConnector/issues/859 https://github.com/mysql-net/MySqlConnector/issues/947

ewfian avatar Aug 03 '22 04:08 ewfian

Dear Community,

Unfortunately, this is exactly the problem I am experiencing and it endangers the stability of all our systems. I have checked our codebase extensively and can rule out any known operating errors.

This problem should be taken very seriously.

We also have 3 database servers behind a load balancer. This load balancer distributes the load evenly among the 3 servers. In the event of network issues, the system switches to an available server. Connections strings are configured accordingly so that the DNS switch is recognised.

But for whatever reason the pool is full, even though hardly any traffic is coming in (100 requests per hour) and what's more, this always happens after several days of operation.

pwhsoft avatar Aug 17 '23 09:08 pwhsoft

I'm not able to reproduce this with the test program, although I modified it slightly to print summary statistics rather than log everything to the console:

using MySqlConnector;

const string connString = "server=localhost;port=3306;user=root;password=pass;Character Set=utf8mb4;";
const string sql = @"SELECT 1;";

int success = 0, eos = 0, utcth = 0, other = 0;

for (var i = 0; i < 10000; i++)
{
	for (var j = 0; j < 100; j++)
	{
		_ = Task.Run(async () =>
		{
			try
			{
				using (var conn = new MySqlConnection(connString))
				{
					await conn.OpenAsync();

					using var cmd = new MySqlCommand(sql, conn);
					await cmd.ExecuteScalarAsync();
					Interlocked.Increment(ref success);
				}
			}
			catch (Exception ex)
			{
				if (ex is MySqlException { ErrorCode: MySqlErrorCode.UnableToConnectToHost })
					Interlocked.Increment(ref utcth);
				else if (ex.InnerException is EndOfStreamException)
					Interlocked.Increment(ref eos);
				else
					Interlocked.Increment(ref other);
			}
		});
	}
	Console.Write($"\rSuccess:{success} UnableToConnect:{utcth} EndOfStream:{eos} Other:{other}");
	await Task.Delay(1000);
}

When the program starts, Success increases by 100 on each "tick". When I stop the Docker container, UnableToConnect steadily increases. When I restart the Docker container, potentially dozens of EndOfStreamExceptions are logged as the container is not ready for connections. After a few seconds, it is fully initialized and the program goes back to logging Success. At no point does Connect Timeout expired. All pooled connections are in use. happen.

It should able to back to normal after restart finished.

This happens for me when I run the test app.

Any other suggestions on how to reproduce the problem?

bgrainger avatar Aug 17 '23 13:08 bgrainger

Connections strings are configured accordingly so that the DNS switch is recognised.

What are your connection string settings?

bgrainger avatar Aug 17 '23 13:08 bgrainger

@bgrainger I just used the latest driver(2.2.7) and the above sample code can still reproduce

<PackageReference Include="MySqlConnector" Version="2.2.7" />

image

Step 1: Start mysql using the following script

docker run -d
-e MYSQL_ROOT_PASSWORD=******
-e MYSQL_DATABASE=******
-e MYSQL_USER=kr
-e MYSQL_PASSWORD=******
-p 3306:3306
--name lk-mysql
--restart=always
--network=host
mysql:5.7

Step 2: Run sample code

Step 3: Repeatedly restart mysql dozens of times to simulate network instability image

ewfian avatar Aug 17 '23 13:08 ewfian

Hello, thank you for your fast response.

I use the following connection string: server=xxx.net;database=abc;user=cde;password=123;SslMode=REQUIRED;DnsCheckInterval=30;ConnectionLifeTime=60;MaximumPoolsize=200;

I experimented with all the settings over 1 month. Always the same problem.

pwhsoft avatar Aug 17 '23 13:08 pwhsoft

Hello, thank you for your fast response.

I use the following connection string: server=xxx.net;database=abc;user=cde;password=123;SslMode=REQUIRED;DnsCheckInterval=30;ConnectionLifeTime=60;MaximumPoolsize=200;

I experimented with all the settings over 1 month. Always the same problem.

And I use the same connector. (2.2.7)

pwhsoft avatar Aug 17 '23 14:08 pwhsoft

You're using docker run ... --network=host; is your client OS Linux? Maybe this is a difference between Linux and Windows (which I'm using for testing) as a client OS?

bgrainger avatar Aug 17 '23 14:08 bgrainger

You're using docker run ... --network=host; is your client OS Linux? Maybe this is a difference between Linux and Windows (which I'm using for testing) as a client OS?

Database server and client application both run on Linux systems. (Ubunut 20.04)

pwhsoft avatar Aug 17 '23 14:08 pwhsoft

You're using docker run ... --network=host; is your client OS Linux? Maybe this is a difference between Linux and Windows (which I'm using for testing) as a client OS?

@bgrainger I added --network=host just because I am running docker in wsl2 now. I can also reproduce it when running in "Docker Desktop on Windows". I think the main point of reproduction is to restart multiple times

This is just to simulate an unstable network situation. In fact, our production environment is running in docker on Azure Linux VM, and mysql use AWS RDS, just like the description in the first question.

Our program is hosted on Azure and working behind a Load Balancer while the mysql server is hosted on AWS. The Load Balancer or other various network devices on the path from Azure to AWS has some potentially instability. Such as NAT aging. Restart server is just a way to simulate NAT aging. Node switching may also occur in AWS RDS. We expects there should have some resilience to recover from a bad network condition that has just been endured or node switching instead of requires manual intervention.

ewfian avatar Aug 17 '23 14:08 ewfian

@bgrainger Anyway .. If there is anything we can do to help you find a solution, please let us know.

pwhsoft avatar Aug 17 '23 15:08 pwhsoft

Well ... since you offered...

You could clone the code (you probably want to use the v2.2 branch), build from source, and see if you can debug the issue in your environment where it happens. Happy to review and merge any PR you develop! Or just some more insight into how the problem happens would be very helpful.

bgrainger avatar Aug 17 '23 17:08 bgrainger

FWIW I captured a screen recording of my running your test app. (Note that I used MySqlConnectorLogManager.Provider = new ConsoleLoggerProvider(MySqlConnectorLogLevel.Trace, isColored: true); so it logs errors in red.)

As I restart the Docker container, you can see flashes of errors being logged, then once the server restarts and network connectivity is restored, the program settles back down to regular operation.

https://files.logoscdn.com/v1/files/71652353/assets/15164306/content.gif?signature=agQ2LGMLqhdSYYz_c-YpdaqtV6w

bgrainger avatar Aug 17 '23 18:08 bgrainger