monger icon indicating copy to clipboard operation
monger copied to clipboard

Switch to a post-3.6.0 MongoDB Java client API

Open kvn-prhn opened this issue 6 years ago • 24 comments

Some of the functions in monger are using deprecated calls on the mongo java API

For example, the get-db function on line 117 in src/clojure/monger/core.clj uses getDB which is deprecated http://api.mongodb.com/java/current/com/mongodb/Mongo.html#getDB-java.lang.String-

kvn-prhn avatar Feb 24 '19 16:02 kvn-prhn

A quick review suggests that after MongoDB Java driver 3.6.0 the API recommended is quite a bit different from earlier versions and this is not a matter of updating a few method calls.

At the minimum it looks like that Monger will need to

Then explore what the modern GridFS API would look like, then expose new[-ish] features.

michaelklishin avatar Feb 24 '19 19:02 michaelklishin

Everything Monger relies on is deprecated, they just haven't been very clear about marking it that way. They figured that deprecating the class at the root of the tree was enough and didn't mark each node further down because they would be unreachable once the root was dropped (probably in 4.0.0).

Specifically the DB expected for each function call is also deprecated, in favor of a new MongoDatabase provided by the new MongoClient. The old collection classes (that monger uses internally) are also deprecated, but the layout is the same in that you can get a new collection class from the new MongoDatabase. Thus Monger can retain the same basic API, just using new classes.

expez avatar Feb 27 '19 13:02 expez

Internally, at Ardoq, I've basically re-implemented most of Monger to add support for transactions.

We would really like to upstream this code, but have been unsure about the status of Monger going forward.

expez avatar Feb 27 '19 13:02 expez

Well, Monger API can change to a reasonable extent to adapt and support more features. Are there any examples of what your API looks like?

michaelklishin avatar Feb 27 '19 22:02 michaelklishin

In our code-base I've created an implementation ns, to avoid cluttering the api of the core namespace. The ->db and db-> functions are updated versions of monger-conversion/{to,from}-db-object

If we forget about sessions and transactions for now, it basically looks like this:

;;; impl.clj
(defn mongo-database
  "Ensure that we have a `MongoDatabase` and not a deprecated
  `com.mongodb.DB` type db."
  [db]
  (if (instance? MongoDatabase db)
    db
    (.getDatabase (.getMongo db) (.getName db))))

(defn mongo-collection
  [db coll]
  (-> db mongo-database (.getCollection (name coll))))

(defmacro maybe-call-with-session
  [method-name db coll & args]
  `(let [mongo-collection# (mongo-collection ~db ~coll)]
     (if *session*
       (. mongo-collection# ~method-name *session* ~@args)
       (. mongo-collection# ~method-name ~@args))))

;;; monger.clj?

(defn find!
  "Find the first document matching `condition`. "
  ([db coll condition]
   (find! db coll condition (->projection)))
  ([db coll condition fields]
   (-> (impl/maybe-call-with-session find db coll (impl/->db condition))
       (.projection (->projection fields))
       .first
       impl/db->)))

Internally we're stilling passing the old soon-to-be deprecated DBs around everywhere, so everything is wrapped by a compat layer. This is probably how Monger would want to do it too, to make the API backward compatible.

Then to add transaction support I've added a new macro:

(defmacro with-transaction
  "Interact transactionally with `db`.

   NOTE: Only the functions in this namespace are 'transactionally
  aware'. Calls to functions in another namespace, that act on `db`,
  won't partake in the same transaction.

   NOTE: `body` will be subject to retries, so you need to think
  carefully about any side-effects that aren't limited to the current
  transaction.

   If you ever need to bail out of a transaction. e.g. after some
  sanity check, just throw an exception (which will be promptly rethrown).

   Example usage:
   (with-transaction my-db-from-elsewhere
    (let [foos (sort-by :bar (find! db :foo))
          bars (sort-by :id (find! db :bar))
          foobars (map merge foos bars)])
      ;; Transactions can be nested
      (with-transaction db
        (when-not (do-something-else-entirely? db)
          (throw (IllegalStateException. \"Aborting transaction.\")))))
      (insert-many! db :foobar foobars)
      (drop! db :foo)
      (drop! db :bar))"
  {:style/indent 1}
  [db & body]
  `(try
     (impl/with-session ~db
       (.startTransaction impl/*session*)
       (let [ret# (impl/run-with-retries!
                   (fn ~'transaction-thunk []
                     (do ~@body))
                   num-retries-per-transaction
                   (str "Transaction failed "
                        num-retries-per-transaction
                        " times.  Giving up."))]
         (impl/run-with-retries! (fn ~'commit-thunk []
                                   (.commitTransaction impl/*session*))
                                 num-retries-per-commit
                                 (str "Failed to commit transaction "
                                      num-retries-per-commit
                                      " times.  Giving up."))
         ret#))
     (catch MongoCommandException e#
       (log/error e# "Failed to perform transaction")
       (util/with-suppressed-errors
         (.abortTransaction impl/*session*))
       (throw e#))))

I've YAGNI'd some stuff related to the with-transaction macro, but my thinking is that any options other than the defaults (to configure sessions, retries etc etc) can be passed as an option map to the macro, which would then create another dynamic binding to make those settings visible to consumers further down the call-stack.

The code I've written contains a few Ardoqisms, which don't belong in Monger. If you want a PR I'd be happy to update the internals of the monger functions, leaving the semantics and names intact. The with-transaction macro is pretty opinionated because I think the defaults will be fine 99.9% of the time and are therefore hidden.

expez avatar Feb 28 '19 11:02 expez

@expez how about this: we switch default branch to 3.5.x-stable and begin moving all the helpers you have to the relevant namespaces in Monger. I'd have to familiarize myself with the new API first but it shouldn't take too long. Then we can find a way to keep the API reasonably backwards compatible.

master version should probably be bumped to 5.0 to indicate a major shift.

michaelklishin avatar Feb 28 '19 14:02 michaelklishin

Note that I can't promise that everything you have will be adopted as is. I hope that's not the goal and you just want to avoid maintaining your own client if Monger evolves in the right direction :)

michaelklishin avatar Feb 28 '19 14:02 michaelklishin

All I can say is that I need Monger to survive. I think there was a fundraiser at one point? I contributed once and I would contribute again. This library is very important to everyone who works with Clojure and MongoDB.

lkrubner avatar Feb 28 '19 18:02 lkrubner

I don't think you should be too concerned about Monger being abandoned, even though my hands are full with a lot of other projects :) I am happy to add more contributors to this repo going forward and grant them the permissions to produce release at some point.

michaelklishin avatar Feb 28 '19 18:02 michaelklishin

@expez how about this: we switch default branch to 3.5.x-stable and begin moving all the helpers you have to the relevant namespaces in Monger.

Sounds great! 👍 We moved to a bigger office today, so there's some chaos, but you can expect some PRs starting next week.

Note that I can't promise that everything you have will be adopted as is. I hope that's not the goal and you just want to avoid maintaining your own client if Monger evolves in the right direction :)

Having everything adopted as-is is definitely not a goal. More eyes usually leads to a better solution that can benefit everyone.

Ardoq also really wants to give back, to the Clojure community in general and perhaps Monger in particular. The company is growing fast now, but the first version wouldn't have been possible had it not been for the high-quality open-source projects available at the time.

expez avatar Feb 28 '19 22:02 expez

I might be interested in helping to refactor to newer MongoDB Java API as well as general Monger maintenance.

vadyalex avatar Mar 01 '19 12:03 vadyalex

I can probably help out a bit too. My employer uses monger a lot and we'd like to see support added for changestreams and transactions. I have some changestream code laying around I may be able to contribute (we adapt them into core.async channels but monger can probably just expose the callback API)

RutledgePaulV avatar Mar 26 '19 03:03 RutledgePaulV

I am traveling till April 23nd, after that I should be able to get started on this. Any spikes or investigations in this area would help (but I understand that no one usually wants to invest any time into experiments).

michaelklishin avatar Apr 08 '19 18:04 michaelklishin

I did a bit of exploratory programming on this recently as I'm also interested in seeing Monger being able to make use of newer MongoDB features. While my Clojure is more than a bit rusty, I might be able to share some of the API changes I made in my local fork shortly. I didn't wrap the changes in a way that doesn't expose the underlying data structure changes, but it might be helpful as a way to gauge the effort needed to implement the API changes that are required at the lower levels.

geuscht-m avatar Apr 25 '19 20:04 geuscht-m

Count me in as someone who would love to have a refreshed Monger based on the new Java driver API. I'm fine with a new API also if it makes sense? Perhaps a new namespace might be in order.

I think this effort should be coordinated somehow. Dedicated Slack channel for example? I've just created #mongo :)

orestis avatar Jun 12 '19 07:06 orestis

I was pointed to https://github.com/lupapiste/mingler which seems unmaintained, but contains a lot of code that deals with the new Mongo API.

orestis avatar Jun 12 '19 11:06 orestis

I'm still plucking away at my experiment over at https://github.com/geuscht-m/monger - the changes are all in the 3.x-api-experiments branch if someone wants to have a look.

The main issues I kept running into (from memory) are:

  • A lot of retrieval functions now return Iterables (com.mongodb.client.FindIterable et al), which leads to issues when using with-open inside the monger functions. Basically, the cursor is closed by the time the caller gets hold of it, which isn't very helpful. In some cases it's feasible to wrap the call to the Java driver inside Monger and create the seq of results to return, but that's not really something I think would be a good idea for large result sets.
  • Passing the options to the Java API has changed and now requires generating an Options object on the fly like FindAndUpdateOptions. The types are different, but I didn't think it was a big deal to convert them over
  • The current API has a few functions that return DBObjects. With the move from DBObject to org.bson.Document, this underlying API change percolates up into user applications. Same goes for some of the other areas that expose the underlying result types.
  • Read/Write preferences/concerns are now applied to the MongoConnection. I haven't implemented this API change yet, but I suspect this is going to be another one that is at least partially user visible.

There's still a fair amount of work I have to do to get the majority of tests to pass, but I'm making (slow) progress in my not so ample spare time. Either way it looks to me that there'll be some breaking changes in the monger API unless extra effort is made to hide the API differences and return the old API types where they leak out. Obviously one other option would be to deprecate the parts of the API where the leakage occurs, or have "new monger" with an API that is tied to the Java driver 3.6+ API.

geuscht-m avatar Jun 12 '19 12:06 geuscht-m

@geuscht-m I agree that a small Java wrapper might be optimal for wrapping FindIterable and such. I do this in Langohr to convert protocol responses to maps. It's not pure Clojure but I find that acceptable.

Migrating the API to org.bson.Document, where we have to expose it (as opposed to returning maps) makes sense.

Thank you for your time and efforts.

michaelklishin avatar Jun 12 '19 19:06 michaelklishin

As for how to coordinate this, a pull request with GitHub reviews sounds sufficient to me but we can set up a Discord channel or similar.

@geuscht-m would you like me to add you as a collaborator to this repo so you can push your branch here? I'm happy to add more collaborators in general but would like to have an option of amending the tests and API somewhat without having to submit pull requests for forks :)

michaelklishin avatar Jun 12 '19 19:06 michaelklishin

Sorry for delayed reply.

@michaelklishin I'd be honored if you added me as a collaborator.

One thing I had been noodling on recently is that I'll likely have to change the API for the various find functions etc to include a set of function that accept a MongoCollection instead of the MongoDatabase/collection name that is used right now. This is because in the 3.x the read preferences and read/write concerns are properties of the MongoCollection and setting them returns a new MongoCollection with these options set.

I'll try to integrate this change into my current branch and once that works, I can try to figure out how to bring the changes over to this repo in a meaningful manner.

geuscht-m avatar Jun 24 '19 23:06 geuscht-m

Hey, everyone! Any status updates on this?

TwiceII avatar Sep 10 '19 08:09 TwiceII

@TwiceII with the amount of deprecated methods used in monger, any shift to a new driver would effectively be an entirely new library, so I went ahead and created that library. Have a look and see if it suits you.

https://github.com/gnarroway/mongo-driver-3

gnarroway avatar Nov 22 '19 10:11 gnarroway

Sorry, I haven't had that much time for working on more exploratory programming around this. The main issues are:

  • There is no direct replacement for save() and that part of the 2.x API isn't that easy to emulate (although it's possible to do so if 100% API compatibility isn't necessary)
  • Parts of the existing monger API use status return values that are passed through from the MongoDB Java driver. The underlying APIs have changed for the 3.x Java driver API and no longer return success/failure codes. Instead, they throw exceptions in case of an error.

Based on how far I got with my exploratory conversion, there would be breaking API changes after the conversion to the 3.x Java driver. I do believe large parts of the API would stay very similar, but the possibilities of providing a drop-in replacement are pretty slim.

geuscht-m avatar Dec 06 '19 21:12 geuscht-m