mongo-c-driver icon indicating copy to clipboard operation
mongo-c-driver copied to clipboard

CDRIVER-4363 add client bulk write

Open kevinAlbs opened this issue 9 months ago • 0 comments

Summary

This PR implements the MongoClient.bulkWrite API proposed in: https://github.com/mongodb/specifications/pull/1534

RST documentation was omitted from this PR in case the API changes during review. I plan to add RST documentation in a future PR.

Commits are separated to ease filtering the JSON test files and minor drive-by improvements. See the bulk-write.md file in https://github.com/mongodb/specifications/pull/1534 for the expected behavior.

Background

Terms

The following terms are used in this PR description:

  • MongoCollection.bulkWrite refers to the existing API from the CRUD specification. The C driver implements in terms of mongoc_bulk_operation_t.
  • MongoClient.bulkWrite refers to the new API in the new Bulk Write specification. This PR implements in terms of the new mongoc_bulkwrite_t handle.

MongoCollection.bulkWrite splits write operations by type, and sends separate insert, update, and delete server commands. MongoCollection.bulkWrite writes to only one collection.

MongoClient.bulkWrite uses the bulkWrite server command. The bulkWrite command supports insert, update, and delete operations for multiple namespaces. The bulkWrite command is introduced in server 8.0. MongoClient.bulkWrite can write to multiple collections.

Both bulk write APIs require splitting large writes into separate commands to stay within server size limits. E.g. a call to MongoClient.bulkWrite with many writes may result in multiple bulkWrite commands sent.

Rationale

The following describes rationale behind some decisions in this PR:

Models

The specification defines model classes for each write. Here is the DeleteOneModel definition (with comments removed for brevity):

class DeleteOneModel implements WriteModel {
    filter: Document;
    collation: Optional<Document>;
    hint: Optional<Document | String>;
}

This PR represents the required fields in arguments, and groups optional fields in an opaque opts struct:

MONGOC_EXPORT (bool)
mongoc_bulkwrite_append_deleteone (mongoc_bulkwrite_t *self,
                                   const char *ns,
                                   const bson_t *filter,
                                   mongoc_bulkwrite_deleteoneopts_t *opts /* May be NULL */,
                                   bson_error_t *error);

I thought this was a reasonable compromise between matching the spec and existing C driver patterns. If the spec later extends DeleteOneModel, mongoc_bulkwrite_deleteoneopts_t can be extended.

The existing C driver implementation of MongoCollection.bulkWrite uses bson_t for options:

MONGOC_EXPORT (bool)
mongoc_bulk_operation_remove_one_with_opts (mongoc_bulk_operation_t *bulk,
                                            const bson_t *selector,
                                            const bson_t *opts,
                                            bson_error_t *error); /* OUT */

However, I think a typed mongoc_bulkwrite_deleteoneopts_t is preferable to bson_t. It may ease development with compile time type checks and auto complete.

Naming

This PR uses a slightly different API naming pattern. Parts are logically separated by an underscore. Example: mongoc_bulkwriteopts_set_bypassdocumentvalidation. This differs slightly from the established pattern of separating each word. Example in current API: mongoc_bulk_operation_set_bypass_document_validation. I find the logical separation easier to read.

Validation

The existing C driver implementation of MongoCollection.bulkWrite includes default client-side validation:

validate: Construct a bitwise-or of all desired bson_validate_flags_t. Set to false to skip client-side validation of the provided BSON documents.

The default flags for writes are all the same:

const bson_validate_flags_t _mongoc_default_insert_vflags =
   BSON_VALIDATE_UTF8 | BSON_VALIDATE_UTF8_ALLOW_NULL |
   BSON_VALIDATE_EMPTY_KEYS;

UTF-8 validation appears broken. Passing invalid UTF-8 for string values does not result in error. See: CDRIVER-4448. The default validation only enables BSON_VALIDATE_EMPTY_KEYS to check there are no empty keys (e.g. {"": "foo"}).

CDRIVER-3731 is a request to configure validation at a broader scope. Quoting https://github.com/jeroen/mongolite/issues/206:

It probably makes sense to disable this globally

The C driver benchmarks disable the default validation. I re-enabled the validation and reran the benchmark. This resulted in significant performance drops in the base comparison:

Performance regression with validation enabled

Since UTF-8 validation is already broken, there are requests to disable, validation negatively impacts performance, and it is not a spec requirement, this PR excludes default validation in MongoClient.bulkWrite.

kevinAlbs avatar May 01 '24 17:05 kevinAlbs