celtuce icon indicating copy to clipboard operation
celtuce copied to clipboard

Lazy requires are not thread safe

Open vincentjames501 opened this issue 3 years ago • 7 comments

One of the first thing our application does is start a connection pool. In some cases, the application immediately starts hitting redis in parallel. We occasionally see issues like the following:

           java.lang.IllegalAccessError: output-type is not public
clojure.lang.Compiler$CompilerException: Syntax error compiling at (celtuce/impl/server.clj:1:1).
    data: #:clojure.error{:phase :compile-syntax-check,
                          :line 1,
                          :column 1,
                          :source "celtuce/impl/server.clj"}
java.util.concurrent.ExecutionException: Syntax error compiling at (celtuce/impl/server.clj:1:1).

and

java.lang.IllegalArgumentException: No implementation of method: :evalsha of protocol: #'celtuce.commands/ScriptingCommands found for class: com.sun.proxy.$Proxy7
Exception in thread "main" Syntax error compiling at (celtuce/impl/cluster.clj:1:1).

It can be tricky and a race condition to reproduce. We suspect the error is the lazy requires which are not thread safe.

...
  RedisConnector
  (commands-sync [this]
    (require '[celtuce.impl.server])
...
And other locations where it calls (require ...)

I'm not sure I fully follow the reason the lazy requires are necessary though I suspect it has to do with extending types and manifold but I wonder if there is a slightly more eager place this can be placed.

It may make sense to do some locking similar to how clojure.core/serialized-require works on Clojure 1.10+.

(locking clojure.lang.RT/REQUIRE_LOCK
    (require '[blah.blah]))

vincentjames501 avatar Apr 27 '21 21:04 vincentjames501

Thanks for pointing out this tricky issue, I released 0.4.2 with the locking fix, let me know how it goes.

lerouxrgd avatar Apr 27 '21 22:04 lerouxrgd

That was fast! 😀 Thank you, and I apologize for my confused comment which briefly offered an incorrect suggestion.

brunchboy avatar Apr 27 '21 22:04 brunchboy

@lerouxrgd , thanks for the super quick response!

Something that still bothers me is it feels like the dynamic requires are a bit of a smell? Consider if there was a core.async version of async commands in addition to Manifold. The use of extend-type makes it basically impossible to run them both w/in the same codebase.

Have you considered just having a default implementation of async commands that ships in core? For example:

  (hgetall [this k]
    (d/chain (d/->deferred (.hgetall this k))
             #(into {} %)))

Would just be:

(hgetall [this k]
    (-> (.hgetall this k)
        (.thenApply (reify Function
                      (apply [_ m]
                        (into {} m))))))

It doesn't add any additional dependencies and still returns something that can be derefed from clojure that can easily be wrapped in a manifold deferred and then further chained (or used in core.async, etc). I think this could be used to immediately require everything up front and folks can wrap the responses from there. The manifold module could just reexport the commands namespace functions wrapping the result of each function that returns a future with a d/->deferred.

vincentjames501 avatar Apr 29 '21 02:04 vincentjames501

It's a possibility indeed, but it's a lot of refactoring. If everything is fine, I'd accept such a PR though.

lerouxrgd avatar Apr 29 '21 11:04 lerouxrgd

@vincentjames501 do you still observe the thread safety issue after the require locking fix ?

lerouxrgd avatar May 12 '21 23:05 lerouxrgd

@lerouxrgd, it appears to have. The only unfortunate side effect is it adds some additional locking contention and slows down every redis call by a small, but measurable amount. Doing a pattern like I mentioned about could remove the need for dynamic requires entirely which would be cool!

vincentjames501 avatar May 13 '21 00:05 vincentjames501

I had the reported problem and got it fixed by upgrading celtuce. Kudos to everyone involved 🙌

The only unfortunate side effect is it adds some additional locking contention and slows down every redis call by a small, but measurable amount.

Locking an object and performing a cheap op (require - which is fast after the first call) inside the synchronized block should be orders of magnitude faster than the IO that is typically associated to Redis ops.

Especially in a production environment where Redis will live in a separate machine.

My humble input would be to consider it a non-issue. Other refactorings might be desirable of course.

vemv avatar Aug 06 '21 23:08 vemv