Improve error handling during failed modification
pg_shard's modification logic assumes that any total failure is due to something transient that a retry might overcome. In many cases, an INSERT or UPDATE could fail due to a constraint check, which is not something that a simple retry will overcome without something else changing.
See #31 for an example of what I mean. In its example, the client sees:
# WARNING: Bad result from shard1.demo:5432
# DETAIL: Remote message: duplicate key value violates unique constraint "members_id_key_10007"
# WARNING: Bad result from shard8.demo:5432
# DETAIL: Remote message: duplicate key value violates unique constraint "members_id_key_10007"
# ERROR: could not modify any active placements
A well-written application might want to handle the uniqueness violation in a special fashion, but all pg_shard gives it is a generic error about not being able to modify any placements.
We probably want to try a modification on a placement, then:
- If the error is in the class of things we think a user cares about (constraints, etc.), we fail-fast and throw them the error
- If the error is network related or otherwise "transient", we continue with the remaining shards. If any modification completes, we mark the transient-failure shard as bad
At a higher level, we need to handle modification outcomes in a ternary fashion:
- Total Success — the modification completed successfully
- Application Failure — the modification returned successfully, but the remote DB raised an error
- Infrastructure Failure — the modification didn't even complete, or did so with a network error
Only the third case is deserving of a "could not modify placement" error. In the second we can fail-fast and tell the user what happened.