MySqlConnector icon indicating copy to clipboard operation
MySqlConnector copied to clipboard

System.FormatException: Couldn't interpret '1899-00-00 00:00:00' as a valid DateTime during MySqlConnection.GetSchemaAsync("TABLES", ...)

Open CVET-TJGutjahr opened this issue 2 years ago • 16 comments

Software versions MySqlConnector version: 2.1.10 (nuget latest stable) Server type (MySQL, MariaDB, Aurora, etc.) and version: 8.0.19 .NET version: .NET 6.0 (6.0.300) (Optional) ORM NuGet packages and versions: N/A

Describe the bug When issuing .GetSchemaAsync("TABLES", ...) on a MySqlConnection instance, I am getting the aforementioned exception. I am using GetSchemaAsync("TABLES", ...) to retrieve a list of tables as part of a bulk-import ETL process. Somehow this particular database has some invalid DateTime representations in the schema itself? I have ConvertZeroDateTime set to true. AllowZeroDateTime was undefined/default but I also tried setting that to true which then yielded a different error relating to it being converted from MySqlDateTime to DateTime during the transformation to a schema DataTable.

Additional edge case checking may be needed for garbage DateTimes.

Exception

System.FormatException: Couldn't interpret '1899-00-00 00:00:00' as a valid DateTime
 ---> System.ArgumentOutOfRangeException: Year, Month, and Day parameters describe an un-representable DateTime.
   at System.DateTime..ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second, Int32 millisecond, DateTimeKind kind)
   at MySqlConnector.Core.Row.ParseDateTime(ReadOnlySpan`1 value) in /_/src/MySqlConnector/Core/Row.cs:line 485
   --- End of inner exception stack trace ---
   at MySqlConnector.Core.Row.ParseDateTime(ReadOnlySpan`1 value) in /_/src/MySqlConnector/Core/Row.cs:line 556
   at MySqlConnector.Core.TextRow.GetValueCore(ReadOnlySpan`1 data, ColumnDefinitionPayload columnDefinition) in /_/src/MySqlConnector/Core/TextRow.cs:line 90
   at MySqlConnector.Core.Row.GetValue(Int32 ordinal) in /_/src/MySqlConnector/Core/Row.cs:line 38
   at MySqlConnector.Core.Row.GetValues(Object[] values) in /_/src/MySqlConnector/Core/Row.cs:line 457
   at MySqlConnector.Core.SchemaProvider.FillDataTableAsync(IOBehavior ioBehavior, DataTable dataTable, String tableName, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 996
   at MySqlConnector.Core.SchemaProvider.FillTables(IOBehavior ioBehavior, DataTable dataTable, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 864
   at MySqlConnector.Core.SchemaProvider.GetSchemaAsync(IOBehavior ioBehavior, String collectionName, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 54
   at MySQLImporter.Console.ImportService.RunAsync(CancellationToken stoppingToken) in C:\Users\Travis.Gutjahr\source\repos\OMITTED\Implementations\CustomTools\MySQLImporter.Console\MySQLImporter.Console\ImportService.cs:line 68
   at MySQLImporter.Console.HostedServiceBase.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\Travis.Gutjahr\source\repos\OMITTED\Implementations\CustomTools\MySQLImporter.Console\MySQLImporter.Console\HostedServiceBase.cs:line 33

Code sample

            IList<string?> tableNames;
            using (MySqlConnection srcSqlConn = new MySqlConnection(sourceDatabaseOptions.ConnectionString))
            {
                logger.LogDebug("Connecting to source database...");
                await srcSqlConn.OpenAsync(stoppingToken);

                logger.LogDebug("Retrieving a list of tables from the source database...");
                using (DataTable schemaTable = await srcSqlConn.GetSchemaAsync("TABLES", stoppingToken))
                    tableNames = schemaTable.Rows.Cast<DataRow>().Where(a => string.Equals(sourceDatabaseOptions.Database, a["TABLE_SCHEMA"]?.ToString(), StringComparison.OrdinalIgnoreCase)).Select(a => a["TABLE_NAME"]?.ToString()).ToList();
            }

Expected behavior .GetSchemaAsync("TABLES", ...) to complete without exception and proceed to the next line where schemaTable object is enumerated.

Additional context Not sure how you want to resolve this as this appears to be something specific to this data set and I am not sure I can reproduce this in a new database as it related to schema information. If you have any suggestions as to how you'd want this fixed I can either submit a PR or I can test your proposed fixed by grabbing a branch. Other databases work fine with my code but this database in particular is having a hard time due to this weird DateTime in the schema.

CVET-TJGutjahr avatar Jun 05 '22 16:06 CVET-TJGutjahr

One thing to bear in mind is that since GetSchema and GetSchemaAsync return a DataTable, these DateTimes WILL be converted to System.DateTime and therefore whatever fix is implemented, it'll have to adhere to the range of System.DateTime and not MySqlDateTime.

CVET-TJGutjahr avatar Jun 05 '22 17:06 CVET-TJGutjahr

That forked commit is just what I did to workaround it. I didn't submit it as a pull request since I failed to read the contributing guide prior to making that change. I am also not certain that this is the best way to fix this issue.

Perhaps setting AllowZeroDateTime to true so it returns from there as a MySqlDateTime and then adding logic to the IConvertable logic so that when it inevitably translates to DateTime for the GetSchema DataTable it can handle out of range values? Just kind of thinking out loud.

CVET-TJGutjahr avatar Jun 06 '22 14:06 CVET-TJGutjahr

Not sure how you want to resolve this as this appears to be something specific to this data set and I am not sure I can reproduce this in a new database as it related to schema information.

Now that you've got GetSchemaAsync("TABLES") working, can you look at the output to see where DateTime.MinValue (i.e., 1/1/0001 12:00:00 AM) is showing up? That might provide some insight on how to reproduce this bug.

I'd like to have a reproducible test case first, to (a) understand why it's happening and that the fix is appropriate, and (b) add an integration/regression test for it.

bgrainger avatar Jun 06 '22 19:06 bgrainger

if AllowZeroDateTime is false

  1. SchemaProvider class is initialized and GetSchemaAsync method is called on it.
  2. fillAction (FillTables method) looked up from m_schemaCollections ({ "Tables", FillTables },) and called.
  3. DataTable columns are added here with typeof(DateTime) which is expected.
  4. FillDataTableAsync is called which issues the query containing the offensive column data
  5. "SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, ENGINE, VERSION, ROW_FORMAT, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH, MAX_DATA_LENGTH, INDEX_LENGTH, DATA_FREE, AUTO_INCREMENT, CREATE_TIME, UPDATE_TIME, CHECK_TIME, TABLE_COLLATION, CHECKSUM, CREATE_OPTIONS, TABLE_COMMENT FROM INFORMATION_SCHEMA.TABLES;"
  6. MySqlDataReader's GetValues is called.
  7. GetValue is called for each column.
  8. GetValueCore is called.
  9. Switch statement case ColumnType.DateTime: is hit and ParseDateTime method is called.
  10. The catch statement in ParseDateTime is hit at line 550 (master branch) after executing the return on line 547 due to the month and day both being 0, unrepresentable in System.DateTime.

else if AllowZeroDateTime is true

...same until after MySqlDataReader's GetValues call completes. Error occurs on "dataTable.Rows.Add(rowValues);" which performs an implicit IConvertable conversion of MySqlDateTime to System.DateTime and errors correctly due to IsValidDateTime being false. Line 201 of MySqlDateTime.cs is the line it hits and errors at with InvalidCastException. "DateTime IConvertible.ToDateTime(IFormatProvider? provider) => IsValidDateTime ? GetDateTime() : throw new InvalidCastException();"

ParseDateTime's ReadOnlySpan value parameter =...

new ReadOnlySpan<byte>[19] { 49, 56, 57, 57, 45, 48, 48, 45, 48, 48, 32, 48, 48, 58, 48, 48, 58, 48, 48 }

which translates to "1899-00-00 00:00:00" as UTF8. Valid MySqlDateTime but not value System.DateTime.

As an alternative to the fix I made, this could also be fixed if we set AllowZeroDateTime to true and then adjust line 201 of MySqlDateTime's IConvertible.ToDateTime to return DateTime.MinValue if IsValidDateTime is false instead of throwing InvalidCastException. That could introduce bugs in any consumer that expects that exception to be thrown however.

I do not like the alternative fix route mentioned above because for the regular querying of data on this connection, I do not want AllowZeroDateTime to be true and this error is happening as a result of GetSchema's expectation and requirement that the column be in DateTime.

Ultimately I believe the best fix is something akin to your line 497 of Row.cs (ParseDateTime method) which is... if (year == 0 && month == 0 && day == 0) which is basically what I had implemented but with the additional constraint that it only perform my logic if it is a schema information request. We could implement it for all ParseDateTime runs. if the year, month, or day is out of range for System.DateTime AND AllowZeroDateTime is false (meaning we don't allow out of range values) AND ConvertZeroDateTime is true, we return DateTime.MinValue. The ONLY reason I didn't just do that is the potential to break user-code if someone expects an error to be thrown as opposed to DateTime.MinValue being returned but line 500 already returns DateTime.MinValue if year, month, and day are all 0 so...

The best fix may just be...

		// If we're converting zero or invalid DateTimes (ConvertZeroDateTime) but not allowing MySqlDateTime objects (AllowZeroDateTime), then we need to make sure we're within range. If we're not, return DateTime.MinValue. See https://github.com/mysql-net/MySqlConnector/issues/1180
		if (Connection.ConvertZeroDateTime && !Connection.AllowZeroDateTime && (year < DateTime.MinValue.Year || year > DateTime.MaxValue.Year || month < DateTime.MinValue.Month || month > DateTime.MaxValue.Month || day < DateTime.MinValue.Day || day > DateTime.MaxValue.Day))
			return DateTime.MinValue;

CVET-TJGutjahr avatar Jun 07 '22 12:06 CVET-TJGutjahr

Side note: most databases on this MySQL server have a valid UPDATE_TIME, either NULL or an actual value ('2020-01-21 05:45:28') that isn't a date before computers as we know them exists. One specific database has the offending information in INFORMATION_SCHEMA's UPDATE_TIME but since the dates are consistent and well-formed, I don't suspect corruption, just... MySQL weirdness.

CVET-TJGutjahr avatar Jun 07 '22 13:06 CVET-TJGutjahr

One specific database has the offending information in INFORMATION_SCHEMA's UPDATE_TIME but since the dates are consistent and well-formed, I don't suspect corruption, just... MySQL weirdness.

Thanks; this was the information I was looking for.

Curiously, googling for mysql "1899-00-00 00:00:00" turns up just this issue, no it's not a widespread problem in MySQL Server as far as I can tell.

bgrainger avatar Jun 07 '22 14:06 bgrainger

Agreed, I can't say I've ever encountered this until now.

CVET-TJGutjahr avatar Jun 07 '22 14:06 CVET-TJGutjahr

This run failed with the following exception:

[xUnit.net 00:00:04.32]     SideBySide.SchemaProviderTests.GetSchema(schemaName: "Partitions") [FAIL]
  Skipped SideBySide.CommandTimeoutTests.CommandTimeoutResetsOnReadSync [1 ms]
  Failed SideBySide.SchemaProviderTests.GetSchema(schemaName: "Partitions") [1 s]
  Error Message:
   System.FormatException : Couldn't interpret '1899-00-�5 �5:�5:�5' as a valid DateTime
  Stack Trace:
     at MySqlConnector.Core.Row.ParseDateTime(ReadOnlySpan`1 value) in /_/src/MySqlConnector/Core/Row.cs:line 485
   at MySqlConnector.Core.TextRow.GetValueCore(ReadOnlySpan`1 data, ColumnDefinitionPayload columnDefinition) in /_/src/MySqlConnector/Core/TextRow.cs:line 90
   at MySqlConnector.Core.Row.GetValue(Int32 ordinal) in /_/src/MySqlConnector/Core/Row.cs:line 38
   at MySqlConnector.Core.Row.GetValues(Object[] values) in /_/src/MySqlConnector/Core/Row.cs:line 457
   at MySqlConnector.Core.SchemaProvider.FillDataTableAsync(IOBehavior ioBehavior, DataTable dataTable, String tableName, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 997
   at MySqlConnector.Core.SchemaProvider.FillPartitions(IOBehavior ioBehavior, DataTable dataTable, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 403
   at MySqlConnector.Core.SchemaProvider.GetSchemaAsync(IOBehavior ioBehavior, String collectionName, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 55
   at MySqlConnector.MySqlConnection.GetSchema(String collectionName) in /_/src/MySqlConnector/MySqlConnection.cs:line 596
   at SideBySide.SchemaProviderTests.GetSchema(String schemaName) in D:\a\1\s\tests\SideBySide\SchemaProviderTests.cs:line 105

bgrainger avatar Jul 09 '22 13:07 bgrainger

Did that error occur on my fork branch or on master with a data set you were able to find?

CVET-TJGutjahr avatar Jul 27 '22 14:07 CVET-TJGutjahr

No, it seemed like a spurious build failure unrelated to the code that was pushed. So possibly some kind of bug in MySQL Server?

bgrainger avatar Jul 27 '22 15:07 bgrainger

Another run failed (using MySQL 8.0.15 on Windows as the server):

[xUnit.net 00:00:04.79]     SideBySide.SchemaProviderTests.GetSchema(schemaName: "Tables") [FAIL]
  Failed SideBySide.SchemaProviderTests.GetSchema(schemaName: "Tables") [1 s]
  Error Message:
   System.FormatException : Couldn't interpret '1899-00-�5 �5:�5:�5' as a valid DateTime
  Stack Trace:
     at MySqlConnector.Core.Row.ParseDateTime(ReadOnlySpan`1 value) in /_/src/MySqlConnector/Core/Row.cs:line 556
   at MySqlConnector.Core.TextRow.GetValueCore(ReadOnlySpan`1 data, ColumnDefinitionPayload columnDefinition) in /_/src/MySqlConnector/Core/TextRow.cs:line 90
   at MySqlConnector.Core.Row.GetValue(Int32 ordinal) in /_/src/MySqlConnector/Core/Row.cs:line 38
   at MySqlConnector.Core.Row.GetValues(Object[] values) in /_/src/MySqlConnector/Core/Row.cs:line 457
   at MySqlConnector.Core.SchemaProvider.FillDataTableAsync(IOBehavior ioBehavior, DataTable dataTable, String tableName, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 996
   at MySqlConnector.Core.SchemaProvider.FillTables(IOBehavior ioBehavior, DataTable dataTable, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 864
   at MySqlConnector.Core.SchemaProvider.GetSchemaAsync(IOBehavior ioBehavior, String collectionName, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/SchemaProvider.cs:line 54
   at MySqlConnector.MySqlConnection.GetSchema(String collectionName) in /_/src/MySqlConnector/MySqlConnection.cs:line 596
   at SideBySide.SchemaProviderTests.GetSchema(String schemaName) in D:\a\1\s\tests\SideBySide\SchemaProviderTests.cs:line 107

bgrainger avatar Aug 12 '22 21:08 bgrainger

Searching for "1899-00-É5 É5:É5:É5" finds a couple of hits in Google. É is 0xC9, which could possibly be 201 or -55 as a numeric value.

bgrainger avatar Aug 12 '22 21:08 bgrainger

I have a theory.

In 8.0.15, the code to convert a date to a string is here: https://github.com/mysql/mysql-server/blob/mysql-8.0.15/mysys/my_time.cc#L1096-L1176

Note the comment:

This functions don't check that given MYSQL_TIME structure members are in valid range. If they are not, return value won't reflect any valid date either.

Specifically, if any value is -1, then the following C++ code will print C9 35, i.e., É5:

  temp = ltime->hour;
  temp2 = temp / 10;
  temp = temp - temp2 * 10;
  *to++ = (char)('0' + (char)(temp2));
  *to++ = (char)('0' + (char)(temp));

So why do you get 1890-00-00 00:00:00? Because that code has changed in a newer version to clamp the output to 0-100:

  • https://github.com/mysql/mysql-server/blob/mysql-8.0.30/mysys/my_time.cc#L1333-L1353 calls format_two_digits
  • format_two_digits performs a range check and sets out-of-range values to 0: https://github.com/mysql/mysql-server/blob/mysql-8.0.30/mysys/my_time.cc#L1231-L1242

It's my belief that somewhere in MySQL, there is still a bug that causes MYSQL_TIME fields to be set to -1. In older versions of MySQL, that prints 1899-00-É5 É5:É5:É5. In newer versions, that prints 1899-00-00 00:00:00. Either way, it's a server-side bug that's sending invalid data.

MySqlConnector could work around that bug and coerce these two values to a valid DateTime, but unless there is a constant value that these bogus strings are meant to represent, I'm concerned that could result in silent data loss. Better to throw an exception and have the client retry to get the right data.

bgrainger avatar Aug 12 '22 23:08 bgrainger

That sounds fine for general use. However during my use case the issue was occurring during .GetSchemaAsync("TABLES", ...) on the MySqlConnection. No amount of retrying will help. This is an ETL flow, we won't have the ability to upgrade or alter their configuration in any way, including getting them to update MySQL Server.

Perhaps a compromise of an option to transform bad data for GetSchema calls only? The schema DataTable returned by GetSchema calls in all ADO.NET implementations always seems to be DateTime so replacing them with MySqlDate class wouldn't work. I think it has to be fudged a bit here to be both compatible and also work.

CVET-TJGutjahr avatar Aug 15 '22 11:08 CVET-TJGutjahr

Just double-checking: you don't have any way to create a repro for this problem; it's just something you observe with the schema for a few (?) tables in one particular database on one particular server?

bgrainger avatar Aug 16 '22 20:08 bgrainger

I can repro it with this specific database but I can't share the database as it has customer's data. I could probably drop all other tables and truncate the table in question, leaving only the schema. I'll work on that today and see if the database still blows up. If it does, I'll supply that database.

CVET-TJGutjahr avatar Aug 17 '22 11:08 CVET-TJGutjahr

Moved teams, no easily sharable sample dataset. Closing as not fixing.

CVET-TJGutjahr avatar Mar 29 '23 14:03 CVET-TJGutjahr