MySqlConnector icon indicating copy to clipboard operation
MySqlConnector copied to clipboard

Improve MySqlBulkCopy API

Open bgrainger opened this issue 3 years ago • 8 comments

The API for MySqlBulkCopy was patterned on SqlBulkCopy, which has no return value from its WriteToServer method.

https://github.com/mysql-net/MySqlConnector/issues/1005 points out that warnings are not surfaced to the user. MySqlBulkCopy.WriteToServer will throw if any rows fail to be inserted, but the warnings could be indications of significant data loss. You can attach an InfoMessage handler, or manually execute SHOW WARNINGS after, but it's not obvious that you should do either of those things.

The MySqlBulkCopy API should be improved to help clients use it without data loss.

One possibility would be returning a Results type, that has WarningCount, RowsCopiedCount and RowsInsertedCount. Clients could be told to check that RowsCopiedCount == RowsInsertedCount && WarningCount ==0. Getting the actual warnings would still take additional work. (Add a helper method that invokes SHOW WARNINGS; and returns an IReadOnlyList<MySqlError>?)

Or the API could return IReadOnlyList<MySqlError> and automatically call SHOW WARNINGS behind the scenes if the warning count from the operation was non-zero. However, because these aren't easily parseable (e.g., to identify the rows/cells that were wrong), it's not clear what should be done with them (display them directly to the end user?) beyond just checking if the collection is non-empty, which can be accomplished simply by checking a warning count.

bgrainger avatar Jul 04 '21 16:07 bgrainger

The problem with both of these suggested possibilities is that they're both return types, but if there's an error that prevents the data from being inserted, MySqlBulkCopy.WriteToServerAsync throws an exception and so nothing returns.

This in and of itself isn't necessarily a bad thing. Throwing exceptions on error conditions is idiomatic .NET code. What's needed is to attach the error information to the exception. Would it be possible to query for SHOW WARNINGS at this point and attach the result to the exception object? Perhaps something along the lines of

var errorMsg = string.Join(Environment.NewLine, ErrorList.Select(e => $"{e.Level}: {e.Message}"));
throw new MySqlException(MySqlErrorCode.BulkCopyFailed, "{0} rows were copied to {1} but only {2} were inserted. Reason:\n\n{3}".FormatInvariant(RowsCopied, tableName, rowsInserted, errorMsg));

masonwheeler avatar Jul 06 '21 02:07 masonwheeler

Would it be possible to query for SHOW WARNINGS at this point and attach the result to the exception object?

That's certainly possible.

Should the exception only be thrown if at least one row failed to be inserted (the current behaviour), or if there is at least one warning (which might be benign, but could imply undesirable data loss)? If only for errors, the user still needs to check for warnings manually.

bgrainger avatar Jul 06 '21 03:07 bgrainger

Good question. I'm still fairly new to MySQL, and I don't know what all causes "warnings," but I have seen a few that should definitely be straight-up errors. IMO anything that's going to cause data loss or data corruption should not be allowed to fail silently.

masonwheeler avatar Jul 06 '21 11:07 masonwheeler

Another possibility:

The API returns IReadOnlyList<MySqlError> and the documentation instructs to check that it's empty (to avoid data loss). If rows inserted is not equal to rows copied, then an exception is thrown. It could be a MySqlBulkCopyException with RowsCopied, RowsInserted, and Warnings properties.

This will allow the API to fail with an exception when there's definite data loss (row not inserted), not throw an exception when there might be a benign warning, and provide a way to check for 100% success for fastidious programmers.

Since this will be a breaking change, it would be good to get into 2.0.

bgrainger avatar Oct 16 '21 22:10 bgrainger

If rows inserted is not equal to rows copied, then an exception is thrown.

A MySqlException is already thrown in this scenario. A derived type could be added if reporting copied vs inserted programmatically is important (but it's already in the exception message).

The API returns IReadOnlyList<MySqlError>

Still makes sense to return MySqlBulkCopyResult with a Warnings property, since that allows for future expansion.

bgrainger avatar Oct 16 '21 22:10 bgrainger

Is this in NuGet now?

masonwheeler avatar Oct 17 '21 03:10 masonwheeler

Not yet; it'll be in the next release. (It is automatically published to GitHub Package Registry: https://github.com/mysql-net/MySqlConnector/packages/39735.)

bgrainger avatar Oct 17 '21 03:10 bgrainger

If an exception is thrown, the user gets no additional details: #1145.

Which was identified in an earlier comment https://github.com/mysql-net/MySqlConnector/issues/1012#issuecomment-874407976

The problem with both of these suggested possibilities is that they're both return types, but if there's an error that prevents the data from being inserted, MySqlBulkCopy.WriteToServerAsync throws an exception and so nothing returns.

bgrainger avatar Feb 25 '22 01:02 bgrainger