mongo-c-driver
mongo-c-driver copied to clipboard
CDRIVER-4363 add client bulk write
Summary
This PR implements the MongoClient.bulkWrite
API proposed in: https://github.com/mongodb/specifications/pull/1534
RST documentation was omitted from this PR in case the API changes during review. I plan to add RST documentation in a future PR.
Commits are separated to ease filtering the JSON test files and minor drive-by improvements. See the bulk-write.md file in https://github.com/mongodb/specifications/pull/1534 for the expected behavior.
Background
Terms
The following terms are used in this PR description:
-
MongoCollection.bulkWrite
refers to the existing API from the CRUD specification. The C driver implements in terms ofmongoc_bulk_operation_t
. -
MongoClient.bulkWrite
refers to the new API in the new Bulk Write specification. This PR implements in terms of the newmongoc_bulkwrite_t
handle.
MongoCollection.bulkWrite
splits write operations by type, and sends separate insert
, update
, and delete
server commands. MongoCollection.bulkWrite
writes to only one collection.
MongoClient.bulkWrite
uses the bulkWrite
server command. The bulkWrite
command supports insert, update, and delete operations for multiple namespaces. The bulkWrite
command is introduced in server 8.0. MongoClient.bulkWrite
can write to multiple collections.
Both bulk write APIs require splitting large writes into separate commands to stay within server size limits. E.g. a call to MongoClient.bulkWrite
with many writes may result in multiple bulkWrite
commands sent.
Rationale
The following describes rationale behind some decisions in this PR:
Models
The specification defines model classes for each write. Here is the DeleteOneModel
definition (with comments removed for brevity):
class DeleteOneModel implements WriteModel {
filter: Document;
collation: Optional<Document>;
hint: Optional<Document | String>;
}
This PR represents the required fields in arguments, and groups optional fields in an opaque opts
struct:
MONGOC_EXPORT (bool)
mongoc_bulkwrite_append_deleteone (mongoc_bulkwrite_t *self,
const char *ns,
const bson_t *filter,
mongoc_bulkwrite_deleteoneopts_t *opts /* May be NULL */,
bson_error_t *error);
I thought this was a reasonable compromise between matching the spec and existing C driver patterns. If the spec later extends DeleteOneModel
, mongoc_bulkwrite_deleteoneopts_t
can be extended.
The existing C driver implementation of MongoCollection.bulkWrite
uses bson_t
for options:
MONGOC_EXPORT (bool)
mongoc_bulk_operation_remove_one_with_opts (mongoc_bulk_operation_t *bulk,
const bson_t *selector,
const bson_t *opts,
bson_error_t *error); /* OUT */
However, I think a typed mongoc_bulkwrite_deleteoneopts_t
is preferable to bson_t
. It may ease development with compile time type checks and auto complete.
Naming
This PR uses a slightly different API naming pattern. Parts are logically separated by an underscore. Example: mongoc_bulkwriteopts_set_bypassdocumentvalidation
.
This differs slightly from the established pattern of separating each word. Example in current API: mongoc_bulk_operation_set_bypass_document_validation
.
I find the logical separation easier to read.
Validation
The existing C driver implementation of MongoCollection.bulkWrite
includes default client-side validation:
validate: Construct a bitwise-or of all desired bson_validate_flags_t. Set to
false
to skip client-side validation of the provided BSON documents.
The default flags for writes are all the same:
const bson_validate_flags_t _mongoc_default_insert_vflags =
BSON_VALIDATE_UTF8 | BSON_VALIDATE_UTF8_ALLOW_NULL |
BSON_VALIDATE_EMPTY_KEYS;
UTF-8 validation appears broken. Passing invalid UTF-8 for string values does not result in error. See: CDRIVER-4448. The default validation only enables BSON_VALIDATE_EMPTY_KEYS
to check there are no empty keys (e.g. {"": "foo"}
).
CDRIVER-3731 is a request to configure validation at a broader scope. Quoting https://github.com/jeroen/mongolite/issues/206:
It probably makes sense to disable this globally
The C driver benchmarks disable the default validation. I re-enabled the validation and reran the benchmark. This resulted in significant performance drops in the base comparison:
Since UTF-8 validation is already broken, there are requests to disable, validation negatively impacts performance, and it is not a spec requirement, this PR excludes default validation in MongoClient.bulkWrite
.