crate-python
crate-python copied to clipboard
Add `BulkResponse` wrapper for improved decoding of HTTP bulk responses
About
CrateDB HTTP bulk responses include rowcount= items, either signalling if a bulk operation succeeded or failed.
- success means
rowcount=1 - failure means
rowcount=-2
https://cratedb.com/docs/crate/reference/en/latest/interfaces/http.html#error-handling
References
The code is coming from CrateDB Toolkit, but is generally usable beyond there.
- https://github.com/crate/cratedb-toolkit/blob/v0.0.27/cratedb_toolkit/io/core.py#L17-L77
- https://github.com/crate/cratedb-toolkit/pull/270
Hi @mfussenegger,
The code is coming from CrateDB Toolkit, but is generally usable beyond there.
- https://github.com/crate/cratedb-toolkit/blob/v0.0.27/cratedb_toolkit/io/core.py#L17-L77
- https://github.com/crate/cratedb-toolkit/pull/270
The idea is to have a concise code representation over here, for the BulkProcessor component, by abstracting away the handling of magic numbers from user-side code.
Because CrateDB Toolkit is rather heavy, and I wanted to make it reusable, I wanted to add it elsewhere. Another option would be to use the now separate cratedb-sqlalchemy package. BulkProcessor depends on it anyway, but BulkResponse doesn't, and would provide a generic little convenience wrapper around CrateDB's special responses to bulk requests.
The idea is to have a concise code representation over here, for the
BulkProcessorcomponent, by abstracting away the handling of magic numbers from user-side code.
That kinda confirms my point in that there's not much more code without the BulkResponse wrapper:
cursor = self.connection.execute(statement=statement, parameters=operation.parameters)
self.connection.commit()
cratedb_bulk_result = getattr(cursor.context, "last_executemany_result", None)
failed_records = 0
success_count = 0
for result in cratedb_bulk_result:
if result["rowcount"] == -2:
failed_records += 1
else:
success_count += 1
self._metrics.count_success_total += success_count
self.progress_bar and self.progress_bar.update(n=success_count)
This to me would even be more readable as it avoids the additional indirection.
Also, given that the DB API spec says:
Return values are not defined.
I think we could extend our return value - as long as we can keep it compatible with the existing one.