vertx-mongo-client
vertx-mongo-client copied to clipboard
Sharing a MongoClient for multiple databases
Describe the feature
I would like to access multiple databases from a single MongoClient instance.
As far as I can understand the source code, this is not the case: a single MongoClient instance is always tightly linked to a single database (see MongoClientImpl.java in vertx-mongo-client)
private static class MongoHolder implements Shareable {
com.mongodb.reactivestreams.client.MongoClient mongo;
MongoDatabase db;
[...]
mongo = MongoClients.create(parser.settings());
db = mongo.getDatabase(parser.database());
Use cases
This would reduce the thread and heap pressure for Vert.x applications having to deal with many databases.
I've actually realized that each com.mongodb.reactivestreams.client.MongoClient instance seems to come with 2 threads, and 1 buffer pool (keeping buffers in the JVM heap). So if a Vert.x application has to deal with 150 databases, the JVM eventually end up with 300 threads and hundreds of MB of buffer pools.
It is of course possible to mitigate the risk by introducing a kind of LRU pool of Vert.x MongoClient (and that's what I've done, so I'm not blocked so far) but this is unfortunate since the MongoDB MongoClient seems to be able to handle multiple databases without this extra complexity (put differently: the MongoClient Database association seems to be Vert.x-idiomatic).
I hope (but I have not written a pure MongoDB program to give it a try!) using a single MongoClient would have real benefits regarding the JVM's number of threads and heap load.
This one really needs attention. vertx-mongo-client should support this since underling client does.
This limitation is introducing so much hassle when dealing with multi-tenancy.
I scoped the issue and mention that it can be contributed by the community to get contributors attention @bboyz269
@vietj @bboyz269 I am ready to discuss the topic with you if you are willing to.
I'm under the impression it is not so complex to implement but, because the io.vertx.ext.mongo.MongoClient
is mixing client and databases methods, we probably need a 2 steps plan: a first move with non-breaking API changes, and a second move with breaking API changes.
@bfreuden can you share a proposal ?
Yes.
Having a look at the methods of the io.vertx.ext.mongo.MongoClient
interface, I'm realizing it could almost be called io.vertx.ext.mongo.MongoDatabase
(even io.vertx.ext.mongo.MongoCollection
).
Let me show you the MongoClient.java
file without its database-only methods:
- methods having a
String collection
parameter -
getCollections
methods -
runCommand
methods because they are database-related (see https://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/MongoDatabase.html) - GridFs-related methods because they are database-related (see https://mongodb.github.io/mongo-java-driver/3.6/driver/tutorials/gridfs/)
cat MongoClient.java | sed 's,</i>,,g' | sed 's,\<i>,,g' | sed 's,http://,,g' | sed 's,and/or,,g' | tr "\n" "\t" | sed 's,\/\*\*[^\/]*/,,g' | tr "\t" "\n" | grep -v "String collection" | grep -v Fluent | sed '/^ *$/d' | grep -v "^ *Handler<AsyncResult.*$" | grep -v "^import" | grep -v "^package" | grep -v GridF | grep -v runCommand | grep -v getCollections
a>
*/
@VertxGen
public interface MongoClient {
String DEFAULT_POOL_NAME = "DEFAULT_POOL";
String DEFAULT_DB_NAME = "DEFAULT_DB";
static MongoClient create(Vertx vertx, JsonObject config) {
return new MongoClientImpl(vertx, config, UUID.randomUUID().toString());
}
static MongoClient createShared(Vertx vertx, JsonObject config, String dataSourceName) {
return new MongoClientImpl(vertx, config, dataSourceName);
}
static MongoClient createShared(Vertx vertx, JsonObject config) {
return new MongoClientImpl(vertx, config, DEFAULT_POOL_NAME);
}
@GenIgnore
static MongoClient createWithMongoSettings(Vertx vertx, JsonObject config, String dataSourceName, MongoClientSettings settings) {
return new MongoClientImpl(vertx, config, dataSourceName, settings);
}
Future<Void> close();
void close(Handler<AsyncResult<Void>> handler);
}
We can see that there are only close
and static factory methods left.
So we could create a io.vertx.ext.mongo.MongoDatabase
interface with all the other methods and add a getDatabase
method to the io.vertx.ext.mongo.MongoClient
interface:
public interface MongoClient {
...
MongoDatabase getDatabase(String name);
...
}
In the first step of the refactoring, the MongoClient
interface could either extend the MongoDatabase
interface (developer-friendly solution) or duplicate them (for the sake of being able to mark them as deprecated).
In the first approach, we could create another MongoClientImpl constructor taking a another MongoClientImpl and a database name as a parameter:
private boolean isDatabase = false;
public MongoClientImpl(MongoClientImpl client, String database) {
isDatabase = true
// more or less a shallow copy of client fields
}
The isDatabase
boolean will let us throw an UnsupportedOperationException
in close
methods of this particular instance.
The MongoClientImple.getDatabase()
method will simply call this constructor.
In the second approach we create a MongoDatabaseImpl
class and copy most of MongoClientImpl
methods into it.
Then we introduce a database
field in MongoClientImpl
:
public interface MongoClientImpl {
...
private MongoDatabaseImpl database;
MongoDatabase getDatabase(String name);
...
}
And we live with the burden of manually delegating MongoClientImpl
method calls (they don't change so often) to the MongoDatabaseImpl
instance (or maybe there's something nifty to do with dynamic proxies).
In a next comment I will discuss the problem of the factory methods. What's your first impression at this point? Am I completely out of my mind? :)
For the moment, factory methods require a database name in the config of the client.
As it must no longer be the case in the future, all database-related methods of the MongoClient should "throw" an IllegalStateException
if the database name is not set in the config.
Maybe we could log a deprecation warning to discourage users from specifying a database name MongoClient
factory methods?
Should we create factory methods in the new MongoDatabase
interface and let it have a getClient
method?
public interface MongoDatabase {
...
static MongoDatabase create(Vertx vertx, JsonObject config) {
return new MongoClientImpl(vertx, config, UUID.randomUUID().toString());
}
static MongoDatabase createShared(Vertx vertx, JsonObject config, String dataSourceName) {
return new MongoClientImpl(vertx, config, dataSourceName);
}
static MongoDatabase createShared(Vertx vertx, JsonObject config) {
return new MongoClientImpl(vertx, config, DEFAULT_POOL_NAME);
}
MongoClient getClient();
}
If that was the case the migration path for vanilla use cases would be from:
// config containing an "auth" database name
MongoClient client = MongoClient.createShared(vertx, config)
Future<Long> count = client.count("users", JsonObject())
...
client.close()
to:
// same config object as above (with a database name) => deprecation warning
MongoDatabase db = MongoDatabase.createShared(vertx, config);
// all those calls don't change (only the name of the variable does: `client` => `db`)
Future<Long> count = db.count("users", JsonObject())
...
db.getClient().close() // close requires access to the client
or:
// config without the database name => no deprecation warning
MongoDatabase db = MongoClient.createShared(vertx, config).getDatabase("auth");
Future<Long> count = db.count("users", JsonObject())
...
db.getClient().close() // close requires access to the client (it's the only method left on the client)
Users will still be able to use the original code but it will eventually throw an error (vert.x 5?) when a database name is provided in the config.
In the future (the second move I was referring to) we will remove (vert.x 5?) all database-related methods from the MongoClient
interface.
The Vert.x MongoClient is also a bit special compared to the Java driver in the sense that it also mixes MongoDatabase
and MongoCollection
methods (in addition to mixing MongoClient
methods).
That's why the first parameter of many methods is String collection
.
I'm not sure we want to push the refactoring that far though, because the migration path would be harder:
MongoDatabase db = MongoClient.createShared(vertx, config).getDatabase("auth");
// this would require to rewrite all method calls to add getCollection and remove the first parameter
Future<Long> count = db.getCollection("users").count(JsonObject())
...
db.getClient().close()
Do you think you could provide a new implementation from scratch of this instead ?
On Thu, Jul 8, 2021 at 6:19 PM bfreuden @.***> wrote:
The Vert.x MongoClient is also a bit special compared to the Java driver in the sense that it also mixes MongoDatabase and MongoCollection methods (in addition to mixing MongoClient methods).
That's why the first parameter of many methods is String collection.
I'm not sure we want to push the refactoring that far though, because the migration path would be harder:
MongoDatabase db = MongoClient.createShared(vertx, config).getDatabase("auth"); // this would require to rewrite all method calls to add getCollection and remove the first parameterFuture<Long> count = db.getCollection("users").count(JsonObject())... db.getClient().close()
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vert-x3/vertx-mongo-client/issues/235#issuecomment-876572475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCRGZG3PI6T3L4SZKJ3TWXFZHANCNFSM4TJZUZ2A .
@vietj
from sratch
do you mean I'm to create a new vertx-mongo-client2 repo in my github account, and to reuse the current Vert.x mongo client source code? If so: very probably, but I won't be able to start before before end of July.
@vietj sorry I've not been able to move on the subject : it's still on my mind but I'm too busy.
@bfreuden thanks for the feedback, moved this to 4.3.0 milestone
@tsegismont @vietj: since I have some spare time during summer, I tried to move on the topic.
I wrote a javadoc Doclet to analyze the source of the MongoDB reactive driver to have a better understanding of the MongoDB API before proceeding (you can open gephi files in the root of my repo with gephi).
I realized there are tons of classes (84) related to the creation of the MongoClient. When you set those apart, you cut the number of classes by more than 2 (around 70 remaining).
Based on that "study", I've been able to programmatically generate a Vert.x API based on the reactive API. For the moment I focused on reactive classes. There are 25 options and builder classes that should be translated using the Vert.x @DataObject
idiom. This is not done already. And there are still plenty of incorrect things here and there (gridfsbucket).
However I would like to have your first impression: would you say I am on the right track?
https://bfreuden.github.io/vertx-mongodb-gen/io/vertx/mongo/client/package-summary.html
Eventually I don't want to map each and every MongoDB class to a Vert.x class: I think it is OK for the API to return MongoDB classes as long as they are not part of the com.mongodb.reactivestreams
package. It think it is also very safe to leave methods returning MongoDB enums.
I also made sure to preserve useful information like server information and links to the MongoDB doc itself (e.g. find).
My repo: https://github.com/bfreuden/vertx-mongodb-gen
I've added some code to generate *Options
classes:
https://bfreuden.github.io/vertx-mongodb-gen/index.html
Some classes can't be easily automatically generated, so I've created them manually (empty classes):
https://bfreuden.github.io/vertx-mongodb-gen/io/vertx/mongo/package-summary.html
Next step will be to create *Options
classes from various *Publisher
classes. After that the "big picture" would be almost complete (for the moment I've ignored methods taking Publisher
as a parameter like upload GridFS methods).
Looks like I'm on the right track (see the SimpleTest.java
):
// get first item
mongoClient
.listDatabaseNames()
.first()
.map(res -> "first: " + res)
.onFailure(Throwable::printStackTrace)
.onSuccess(System.out::println);
// get all items
mongoClient
.listDatabaseNames()
.all()
.map(res -> "all: " + res)
.onFailure(Throwable::printStackTrace)
.onSuccess(System.out::println);
// get some items
mongoClient
.listDatabaseNames()
.some(5)
.map(res -> "some: " + res)
.onFailure(Throwable::printStackTrace)
.onSuccess(System.out::println);
// get stream of items
mongoClient
.listDatabaseNames()
.stream().pipeTo(new PrintlnWriteStream<>());
// count documents of a collections
mongoClient
.getDatabase("mydb")
.getCollection("mycol")
.countDocuments()
.map(res -> "count: " + res)
.onFailure(Throwable::printStackTrace)
.onSuccess(System.out::println);
https://bfreuden.github.io/vertx-mongodb-gen/index.html
But there's still a lot to be done:
- watch methods are not implemented
- gridfs is not implemented
- conversions are not implemented
- the JsonObject codec is missing
But for the moment almost everything is generated by the doclet. Not sure that was the right approach but I'm eager to test it on the sources of the latest version of the mongo reactive driver to see if it allows an easier upgrade path.
Today I've integrated the JsonObject
codec and created factory methods (createShared + config etc...) in the interface:
MongoClient mongoClient = MongoClient.create(vertx);
It allowed to check that retrieving data is ok:
// get first document of a collection
mongoClient
.getDatabase("mydb")
.getCollection("mycol")
.find()
.first()
.map(res -> "find: " + res)
.onFailure(Throwable::printStackTrace)
.onSuccess(System.out::println);
I also tested transactions but I get an error because my MongoDB does support them (plus I need help with the future/promise stuff because I'm a Kotlin coroutine guy):
// test transactions
MongoCollection<JsonObject> collection = mongoClient.getDatabase("cinebio")
.getCollection("documents2");
mongoClient.startSession()
.map(session -> new SessionContext<>(mongoClient, session))
.map(SessionContext::startTransaction)
.compose(context -> context.withResult(collection.insertOne(context.session(), new JsonObject().put("foo", "bar"))))
.compose(ResultSessionContext::commitTransaction)
.onSuccess(res -> System.out.println("transaction committed"))
.map(ResultSessionContext::result)
.onSuccess(res -> System.out.println("result is:" + res))
.onFailure(Throwable::printStackTrace);
I have started to port the existing MongoClientTestBase
into my project: vertx-mongodb-gen/src/test/java/io/vertx/ext/mongo/MongoClientTestBase.java
@vietj @tsegismont I would be interested by your feedback. The source code is almost entirely generated.
https://bfreuden.github.io/vertx-mongodb-gen/index.html
It is definitely closer to the original MongoDB API than it was before.
For the moment I am wondering how I will deal with object ids.
Almost all base tests are passing now. Some don't pass with useObjectId=false. I also need to finish the GridFS implementation.
The GridFS implementation is done.
I've started porting the GridFsTest
unit test in vertx-mongodb-gen/src/test/java/io/vertx/ext/mongo/GridFsTest.java
.
String fileName = createTempFileWithContent((1024 * 3) + 70);
String downloadFileName = createTempFile();
GridFSBucket bucket = GridFSBucket.create(mongoDatabase, "fs");
assertNotNull(bucket);
bucket
.drop()
.compose(dropped -> bucket.uploadFile(fileName))
.compose(res -> bucket.downloadByFilename(fileName).saveToFile(downloadFileName))
.onSuccess(event -> {
assertEquals(new File(fileName).length(), new File(downloadFileName).length());
testComplete();
})
.onFailure(this::fail);
At this point I think that, compared to the existing implementation, this implementation lacks the support of useObjectId=false
and a JSON-based configuration to create the MongoClient.
But it adds support for multiple databases and transactions.
@vietj @tsegismont Quite a lot of work so I would be very happy to have your feedback guys.
API doc is here: https://bfreuden.github.io/vertx-mongodb-gen/index.html
I will have a look
Thank you very much! I've added the possibility to create a MongoClient from a json configuration: This is not fully implemented but here is the idea:
Vertx vertx = Vertx.vertx();
// build a real vertx configuration object (data object)
MongoClientSettings settings = new MongoClientSettings()
.setClusterSettings(new ClusterSettings()
.setHosts(Collections.singletonList(new ServerAddress("localhost", 27018))));
// serialize to json for further use
JsonObject jsonSettings = settings.toJson();
System.out.println(jsonSettings);
// build a client from an real configuration object
MongoClient mongoClient = MongoClient.create(vertx, ClientConfig.fromSettings(settings));
// or build a client from a json config
MongoClient mongoClient2 = MongoClient.create(vertx, new ClientConfig(jsonSettings));
The tricky part with the MongoClientSettings
settings class is it is a mix of json serializable and non serializable things like callback, listeners, etc... I've done my best to put the json-compatible things in the vertx configuration object.
For other things you have:
MongoClient mongoClient = MongoClient.create(vertx, new ClientConfig(jsonSettings)
// can't be stored in json
.getPostInitializer().initializeConnectionPoolSettingsWith(
(vertxContext, builder) -> builder.addConnectionPoolListener(...)
)
);
I've started the implementation of useObjectId=false
.
It's done in a different way though: there are still object ids in the database but the client shows them as strings in the json:
mappedMongoClient = MongoClient.create(vertx, getConfig().useObjectIds(false)); // map object ids to strings
mappedMongoDatabase = mappedMongoClient.getDatabase(getDatabaseName());
mongoClient = MongoClient.create(vertx, getConfig().useObjectIds(true)); // leave object ids as-is
mongoDatabase = mongoClient.getDatabase(getDatabaseName());
MongoCollection<JsonObject> mappedColl = mappedMongoDatabase.getCollection(collection);
MongoCollection<JsonObject> coll = mongoDatabase.getCollection(collection);
JsonObject doc = new JsonObject().put("name", "john");
mappedColl.insertOne(doc, onSuccess(id -> {
mappedColl.find(doc).first(onSuccess(resultDoc -> {
Object _id = resultDoc.getValue("_id");
// object ids are mapped to string
assertTrue(_id instanceof String);
coll.find(doc).first(onSuccess(resultDoc2 -> {
assertNotNull(resultDoc2);
Object _id2 = resultDoc2.getValue("_id");
// although they are stored as ObjectIds in the DB
assertTrue(_id2 instanceof JsonObject);
assertTrue(_id2.toString().contains(JsonObjectCodec.OID_FIELD));
// you can search with a string id:
mappedColl.find(new JsonObject().put("_id", (String)_id)).first(onSuccess(resultDoc3 -> {
assertNotNull(resultDoc3);
Object _id3 = resultDoc3.getValue("_id");
assertEquals(_id, _id3);
testComplete();
}));
}));
}));
}));
I think the implementation of useObjectId=false
is complete: I've resurrected the MongoClientTest
test that is running the MongoClientTestBase
test with useObjectId=false
.
There is a behavioural difference with the existing mongo client though.
I've added replaceOne
methods without filter
parameter to the MongoCollection
in order to have something close to the save
and saveWithOptions
that are specific to the existing vert.x mongo client and that don't exist in the MongoDB driver.
@bfreuden can we setup a call to discuss this ?
@vietj certainly. I've updated my GH profile to include my email address. Feel free to send me an invite. My availabilities next week are: monday am, wednesday, thursday and friday am.