crate-python icon indicating copy to clipboard operation
crate-python copied to clipboard

Add `BulkResponse` wrapper for improved decoding of HTTP bulk responses

Open amotl opened this issue 1 year ago • 2 comments

About

CrateDB HTTP bulk responses include rowcount= items, either signalling if a bulk operation succeeded or failed.

  • success means rowcount=1
  • failure means rowcount=-2

https://cratedb.com/docs/crate/reference/en/latest/interfaces/http.html#error-handling

References

The code is coming from CrateDB Toolkit, but is generally usable beyond there.

  • https://github.com/crate/cratedb-toolkit/blob/v0.0.27/cratedb_toolkit/io/core.py#L17-L77
  • https://github.com/crate/cratedb-toolkit/pull/270

amotl avatar Oct 02 '24 20:10 amotl

Hi @mfussenegger,

The code is coming from CrateDB Toolkit, but is generally usable beyond there.

  • https://github.com/crate/cratedb-toolkit/blob/v0.0.27/cratedb_toolkit/io/core.py#L17-L77
  • https://github.com/crate/cratedb-toolkit/pull/270

The idea is to have a concise code representation over here, for the BulkProcessor component, by abstracting away the handling of magic numbers from user-side code.

Because CrateDB Toolkit is rather heavy, and I wanted to make it reusable, I wanted to add it elsewhere. Another option would be to use the now separate cratedb-sqlalchemy package. BulkProcessor depends on it anyway, but BulkResponse doesn't, and would provide a generic little convenience wrapper around CrateDB's special responses to bulk requests.

amotl avatar Oct 07 '24 09:10 amotl

The idea is to have a concise code representation over here, for the BulkProcessor component, by abstracting away the handling of magic numbers from user-side code.

That kinda confirms my point in that there's not much more code without the BulkResponse wrapper:

 cursor = self.connection.execute(statement=statement, parameters=operation.parameters)
 self.connection.commit()
 cratedb_bulk_result = getattr(cursor.context, "last_executemany_result", None)
 failed_records = 0
 success_count = 0
 for result in cratedb_bulk_result:
     if result["rowcount"] == -2:
         failed_records += 1
     else:
         success_count += 1
 self._metrics.count_success_total += success_count
 self.progress_bar and self.progress_bar.update(n=success_count)

This to me would even be more readable as it avoids the additional indirection.

Also, given that the DB API spec says:

Return values are not defined.

I think we could extend our return value - as long as we can keep it compatible with the existing one.

mfussenegger avatar Oct 07 '24 09:10 mfussenegger