crate
crate copied to clipboard
`INSERT INTO SELECT` does not throw an error / warning when some inserts fail
CrateDB version
4.6.8 / 4.7.0
Steps to Reproduce
CREATE TABLE insert_into_source ( ts TIMESTAMP, ts_g TIMESTAMP);
CREATE TABLE insert_into_sink ( ts TIMESTAMP, ts_g TIMESTAMP GENERATED ALWAYS AS date_trunc('minute',ts));
INSERT INTO insert_into_source VALUES (now(),date_trunc('minute',now())+1), (now(),date_trunc('minute',now()));
--INSERT OK, 2 records affected (0.016 seconds)
INSERT INTO insert_into_sink (ts,ts_g) SELECT ts, ts_g FROM insert_into_source;
--INSERT OK, 1 record affected (0.011 seconds)
Expected Result
Statement fails with
Failed to execute upsert shardId=[insert_into_sink][0] id=XOARjn8B6RWc_1mx5EhS
error=java.lang.IllegalArgumentException:
Given value null for generated column ts_g does not match calculation date_trunc('minute', ts) = 1647281100000
Actual Result
no error, just
INSERT OK, 1 record affected (0.011 seconds)
a lower number than expected
Update
This might not be a bug after all, however the situation could be maybe improved to give the user some way to check if an INSERT INTO SELECT
partially failed.
INSERT INTO insert_into_sink VALUES (now(),date_trunc('minute',now())+1), (now(),date_trunc('minute',now()));
returns
Error!
SQLParseException[Given value 1647358980001 for generated column ts_g does not match calculation date_trunc('minute', ts) = 1647358980000]
COPY FROM
has the optionRETURN SUMMARY
to see how many rows failed to be copied and now also the fail_fast
option to stop copy operations in best effort manner when insert error occur.
Possible solution
- Add setting to allow partial inserts/failures
Discussion outcome: Could add a session setting that allows partial failures and causes a "fail fast" behavior (first error encountered gets propagated)
Starting from version 5.5, UPDATE
statements will continue on row failures instead of always showing an error.
The affected rows are displayed by the resulting row count. Related PR
Session setting that allows partial failures and causes a "fail fast" behavior (first error encountered gets propagated)
The new session setting should behave in a similar way for UPDATE
statements.
One possible option to give back a warning could be by means of a NoticeResponse
with severity WARNING
see https://www.postgresql.org/docs/current/protocol-flow.html and https://www.postgresql.org/docs/current/protocol-error-fields.html
We had another discussion about this and concluded that the NoticeResponse
would be the nicer option.
Partly because of bulk request handling where individual "batch" error reporting would work via HTTP, but is problematic for PostgreSQL. PG/DB-API clients typically only expose a row-count array or failure, not one failure per batch.
The other main advantage of a NoticeResponse
is that it can be active by default, so users can always get the additional info.
It's more effort to implement, as it also requires a HTTP response format addition and updates in crash and the admin-ui to show the warnings. It won't make it for 5.7. We can target this for 5.8 instead but that's TBD
We could (later) also additionally add a fail_fast
flag, similar to how it exists for COPY FROM (where error reporting currently works via the RETURN SUMMARY). It wouldn't change the error reporting which would still happen via the notice response, but instead cause it to short-circuit on errors.
Relates to https://github.com/crate/crate/issues/8312 and https://github.com/crate/crate/issues/8949
Main user facing problem:
Confusion if some rows are not inserted (for example because of constraint violation). Current behaviour: Errors are swallowed, number of successfully inserted rows is returned.
Listing all possible options to change behaviour from the user perspective to get additional feedback:
-
Number of inserted rows is returned, errors aren’t swallowed but redirected to
sys.operations
To lookup the errors, users need some identifier.
- HTTP could include it in the response
- for PG protocol we could implement
session_id
and users would lookup bysession_id
insys.operations
. See https://github.com/crate/crate/issues/14939)
-
Number of inserted rows is returned and errors are emitted as warning (see comment above for details)
-
Number of inserted rows is returned. In case of error, show first error and short-circuit the operation (number of inserts is discarded/unknown to a user). Short circuit would be a breaking change unless it’s behind a flag/setting.
-
Number of inserted rows is returned. In case of error, show first error AND number of inserted rows and short-circuit the operation. Short circuit would be a breaking change unless it’s behind a flag/setting.
-
Introduce optional
RETURN SUMMARY
clause forUPDATE/DELETE/INSERT
in addition toRETURNING
. Can’t use bothRETURN SUMMARY
andRETURNING
at the same time