Javet Performance compared to J2V8?

Performance compared to J2V8?

Open Taytay opened this issue 1 year ago • 12 comments

First of all, thank you for taking on the challenge of replacing J2V8. I really like your approach and thoughtfulness!

We are excited to consider the migration from J2V8 for our Android app, but I was doing some very rudimentary profiling, and it looks like Javet might be slower in a number of cases. One of our basic usecases was just calling into a Javascript function frequently (to approximate a very chatty Java<->JS API, and it looks like Javet is about 40% slower. I haven't done a ton of other profiling yet, but the JS engine itself appears to be similarly fast. (If I call into a large slow JS function I see similar perf with J2V8 and Javet.)

I know microbenchmarks can be lame, and this isn't the only way we are evaluating Javet, but I think the main part of my question is twofold: 1: Have you done much profiling work against J2V8? 2: Is there something I should do to speed this up?

@Test
    fun compareJ2V8JavetPerf() {
        val v8 = V8.createV8Runtime()
        v8.executeScript("function emptyFunc(){return;}")

        measure("j2V8Round1") {
            for (i in 0 until 1_000_000) {
                v8.executeVoidFunction("emptyFunc", null)
            }
        }
        measure("j2V8Round1") {
            for (i in 0 until 1_000_000) {
                v8.executeVoidFunction("emptyFunc", null)
            }
        }

        var instance : V8Host = V8Host.getV8Instance();
        var v8Runtime : V8Runtime = instance.createV8Runtime();
        v8Runtime.v8Locker.use { _ ->
            v8Runtime.getExecutor("function emptyFunc(){}").executeVoid();

            measure("JavetRound1") {
                var executor = v8Runtime.getExecutor("emptyFunc();");
                for (i in 0 until 1_000_000) {
                    executor.executeVoid();
                }
            }
            measure("JavetRound2") {
                var executor = v8Runtime.getExecutor("emptyFunc();");
                for (i in 0 until 1_000_000) {
                    executor.executeVoid();
                }
            }
        }
    }

I'm seeing output like this (on my Macbook running the simulator): ⏱ j2V8Round1: 962.841126 ms ⏱ j2V8Round1: 934.805917 ms ⏱ JavetRound1: 1524.791793 ms ⏱ JavetRound2: 1512.266751 msf

(If I take out the v8Locker line, it is about twice as slow).

The real benchmark that we are working on is more about marshaling a large object from JS to Java, and it's quite possible thjat Javet is better there, but after seeing these results, I wanted to ask about it.

I was surprised that Javet was slower since J2V8 spends a third of its executeVoidFunction time inside of checkThread, and Javet gets rid of that completely! From what little I can tell though from the profiler and basic code poking, it looks like 25% of the time is in checkV8Runtime(). Even when I got rid of that by calling into the v8Runtime.v8Internal directly, 1M calls still took 1250ms or so.

Thanks again!

Jul 22 '22 05:07 Taytay

Thank you for the excellent performance comparisons and analysis. I'd like to share my thoughts for your reference.

Have you done much profiling work against J2V8?

No, I haven't done so for the following reasons.

Javet release cycle is aligned with V8's major stable release cycle. Sometimes different versions of V8 perform differently.
Javet doesn't optimize for Java primitives as J2V8 does. In return, Javet unlocks some fancy capabilities. E.g. JavetProxyConverter, JavetBridgeConverter, etc. which are mostly loved by major Javet Android users.
I don't work on Android development frequently so optimization for Android hasn't been brought on the table yet, which means there's room for that.

Is there something I should do to speed this up?

That test case is not fair enough. Let's see what really happens behind the scene.

Operation	J2V8	Javet
1. Compile the code.	No	Yes
2. Get the function by name (emptyFunc)	Yes	No
3. Execute the function	Yes	Yes

So in each loop, Javet test code performs (1) and (3) whereas J2V8 test code performs (2) and (3). Obviously, (2) is faster than (1). What you could improve is to get a global object and invoke the function performing (2) and (3) directly as the following.

v8Runtime.getExecutor("function emptyFunc(){}").executeVoid();
val globalObject = v8Runtime.getGlobalObject();

measure("JavetRound1") {
    for (i in 0 until 1_000_000) {
        globalObject.invokeVoid("emptyFunc");
    }
}
measure("JavetRound2") {
    for (i in 0 until 1_000_000) {
        globalObject.invokeVoid("emptyFunc");
    }
}

However, this is still not fast enough. There's a way to skip (2).

v8Runtime.getExecutor("function emptyFunc(){}").executeVoid();
v8Runtime.getGlobalObject().get("emptyFunc").use { func ->
    measure("JavetRound1") {
        for (i in 0 until 1_000_000) {
            func.callVoid(null);
        }
    }
    measure("JavetRound2") {
        for (i in 0 until 1_000_000) {
            func.callVoid(null);
        }
    }
}

checkV8Runtime() vs checkThread()

That's a nice observation. In order to prevent users from shooting their own feet. I put checkV8Runtime() as a firesafe which for sure is a performance overhead. I think in the future I may expose a flag for experienced users like you to disable that check.

Getting rid of checkThread() is one of the fancy features in Javet because it substantially allows multiple threads to interact with one V8 without synchronization and free from segment faults. My other project Javenode relies on that heavily.

V8Locker is exactly designed for extreme performance. I'm glad to know that it really improves the performance in your test.

Hope these make sense. Please let me know if you have any questions. I look forward to your migration.

Jul 22 '22 06:07 caoccao

Nice! This was a great explanation. My mental model of the API was wrong. I was trying to figure out how to get that handle to avoid the recompilation every call, and thought I'd done so.

Things are much better (about 50% faster than my previous code and 10% faster than J2V8) after this change: ⏱ j2V8Round1: 907.905709 ms ⏱ j2V8Round1: 874.13675 ms ⏱ JavetFastRound1: 812.108292 ms ⏱ JavetFastRound2: 797.450208 ms

So I wanted to see if I could get it even faster, so I profiled this, and in a 2.84s sample of the callExtended Javet function, 945ms (33%) of that was spent inside of checkV8Runtime. I thought - Oh cool - I can get rid of a ton of time by reaching directly into v8Internal and avoiding this check!

                var internal = v8Runtime.v8Internal;
                measure("JavetWithInternalRound1") {
                    for (i in 0 until 1_000_000) {
                        internal.call<V8Value?>(func, null, false, null);
                    }
                }
                measure("JavetWithInternalRound2") {
                    for (i in 0 until 1_000_000) {
                        internal.call<V8Value?>(func, null, false, null);
                    }
                }

But as you can see, it's a bit faster, but not by much. ⏱ j2V8Round1: 962.172834 ms ⏱ j2V8Round2: 927.631209 ms ⏱ JavetFastRound1: 762.648292 ms ⏱ JavetFastRound2: 749.521958 ms ⏱ JavetWithInternalRound1: 736.924417 ms ⏱ JavetWithInternalRound2: 737.640959 ms

I profiled it, and the time to call into checkV8Runtime is indeed gone, but it didn't get 33% faster as I had hoped. I'm new to profiling in Java/Android though, so perhaps there is some profiling overhead that exaggerates the impact of function calls. I do feel like I'm doing something wrong here...

I will say: I do love things that prevent me from making obvious mistakes, so I like how defensive Javet is with its calls to checkV8Runtime, but I also like the idea of being able to say: "Trust me during this part...I'm about to call into JS a LOT, and I know what I'm doing." Before seeing that checkV8Runtime was not the overhead I'd feared, I was going to ask you for some sort of block to turn this checkV8Runtime on its head. I could enter it to tell Javet, "Don't check on the v8 engine while I'm executing code in this block, but if anyone tries to dispose of the engine while I'm executing this block, THEN you can freak out."

And I love that you allowed for multithreaded access! That has been a source of pain for us with J2V8.

Jul 22 '22 17:07 Taytay

Thank you for posting the revised test result. I'm glad to know that works.

checkV8Runtime() can be JIT optimized (to a simple CMP + JMP, I think) so that the performance impact is subtle in the long run. However, if you profile it, the additional byte code surrounding the call may prevent it from being JIT optimized. In this case, I guess the profiling byte code is heavier than the call itself, which leads to a misleading result 33%.

As checkV8Runtime() is not that slow, a flag for turning it off still cannot avoid that simple CMP + JMP. I tend to believe it's not worthy of doing so. Back to its fundamental goal: to prevent core dump, assuming multiple threads are interacting with the same V8, one thread closing the V8 runtime will trigger a core dump in the next call from another thread immediately without this check. So, it's designed to prevent JVM from crash. I suggest using the regular Javet API unless you clearly know what you are doing and you may use V8Internal to squeeze Javet for better performance.

Javet API is designed to be open to extension or low-level tweak. That's why you can find a way to circumvent checkV8Runtime(). And Javet comes with its own mindset which is quite different from J2V8. I bet once you get used to Javet, you won't want to go back to J2V8 anymore. Wish you a successful migration.

Jul 23 '22 00:07 caoccao

Excellent point about checkV8Runtime being essentially JITted away and how the profiler might have interfered with that. And yes, once I saw the actual timing results, I agree that trading that safety for the minuscule perf increase isn't worth it.

I bet once you get used to Javet, you won't want to go back to J2V8 anymore.

Agreed!

Wish you a successful migration.

Thanks!

Jul 23 '22 13:07 Taytay

I already learned a lot about Javet while reading through this thread, so thank you! Here's a very similar microbenchmark, introducing method calls into the mix:

val jsSource = """
    function emptyFunc() { return; }
    class SomeClass { emptyMethod() { return; } }
""".trimIndent()

val j2v8 = V8.createV8Runtime()
j2v8.executeScript(jsSource)
measure("J2V8 function call") {
    for (i in 0 until 1_000_000) {
        j2v8.executeVoidFunction("emptyFunc", null)
    }
}
val j2v8SomeClassInstance = j2v8.executeObjectScript("new SomeClass()")
measure("J2V8 method call") {
    for (i in 0 until 1_000_000) {
        j2v8SomeClassInstance.executeVoidFunction("emptyMethod", null)
    }
}

val javetRuntime = V8Host.getV8Instance().createV8Runtime<V8Runtime>();
val empty = emptyArray<V8ValueObject>() // Hack to fix overload resolution ambiguity when calling callVoid et al
javetRuntime.v8Locker.use { _ ->
    javetRuntime.getExecutor(jsSource).executeVoid();
    val globalObject = javetRuntime.getGlobalObject();

    measure("Javet function call (invokeVoid)") {
        for (i in 0 until 1_000_000) {
            globalObject.invokeVoid("emptyFunc", *empty)
        }
    }
    globalObject.get<V8ValueFunction>("emptyFunc").use { func ->
        measure("Javet function call (callVoid)") {
            for (i in 0 until 1_000_000) {
                func.callVoid(null, *empty)
            }
        }
    }
    javetRuntime.getExecutor("new SomeClass()").execute<V8ValueObject>().use { obj ->
        measure("Javet method call (invokeVoid)") {
            for (i in 0 until 1_000_000) {
                obj.invokeVoid("emptyMethod", *empty)
            }
        }
    }
    javetRuntime.getExecutor("new SomeClass()").execute<V8ValueObject>().use { obj ->
        obj.get<V8ValueFunction>("emptyMethod").use { method ->
            measure("Javet method call (callVoid)") {
                for (i in 0 until 1_000_000) {
                    method.callVoid(obj, *empty)
                }
            }
        }
    }
}

Results:

⏱ J2V8 function call: 761 ms ⏱ J2V8 method call: 885 ms ⏱ Javet function call (invokeVoid): 1215 ms ⏱ Javet function call (callVoid): 811 ms ⏱ Javet method call (invokeVoid): 1227 ms ⏱ Javet method call (callVoid): 2406 ms

For function calls, it looks like callVoid's performance is very similar to J2V8 🎉

On the other hand, when looking at method calls, two things stick out:

When Javet's invokeVoid is called on an instance of a class, it is as fast as when called on the global object, but significantly slower than J2V8 method calls that also need to look up the method by name.
I wonder why callVoid gets 3x slower when a receiver is given. I noticed that when the receiver is null, this is null within the method body, so I assume that the receiver argument is the way of letting the method know on which instance it is operating. Is this correct?

Do you have any tips for us for how to make method calls faster?

Jul 28 '22 17:07 tiwoc

Thank you for the excellent test.

If you review the Javet source code, you will find invoke* and call* family allow both Java objects and V8Value objects to be the arguments. That flexibility implies certain performance overhead because V8VirtualValueList kicks in for the object conversion. The empty object array with type V8ValueObject doesn't match the express function signature which requires V8Value so that the calls hit the expensive code path. You may try to use V8Value instead.

Also, you may try V8Internal#invoke or V8Internal#call to see if it's getting better.

Regarding call and invoke, please check out this doc.

Jul 28 '22 22:07 caoccao

Regarding call and invoke, please check out this doc.

That's very helpful, thank you.

The empty object array with type V8ValueObject doesn't match the express function signature which requires V8Value so that the calls hit the expensive code path. You may try to use V8Value instead.

Since V8ValueObject is a subclass of V8Value, calling any of the invoke* or call* methods with arguments of type V8ValueObject does not go through the slow path (verified via debugger). And indeed: Changing empty's definition to emptyArray<V8Value>() doesn't change the results.

Also, you may try V8Internal#invoke or V8Internal#call to see if it's getting better.

Good point. Interestingly, V8Internal#call is a good bit faster on functions, but doesn't speed up method calls. V8Internal#invoke isn't faster than invoke* in my testing.

I didn't get anything useful out of the Android Studio profiler yet, but I'm looking into it.

Aug 01 '22 11:08 tiwoc

There are something that could be kept an eye on.

How many rounds of GC take place.
Which test cases are impacted by the GC.
Is there a warm-up for JVM.
The scope of the lock could be smaller. (Similar to the DB transaction, you may want to insert 1 million rows then commit in one transaction, but the performance would be...)
What's the test code that calls the internal.

By the way, I'm not a big fan of this kind of test because the code path is similar as well as the performance. It's the features that attract Javet users.

Aug 01 '22 12:08 caoccao

It sounds like you feel these perf tests feel unfair or cast Javet in a negative light, but we are definitely not trying to do that!

We agree that Javet's feature set and well maintained codebase is the primary appeal. In particular, it is multi threading friendly which ranks high on our wish list! We are very excited about it for exactly those reasons, and hope to migrate to it.

But we have a large body of JavaScript already written and use cases involving frequently calling into JS or marshaling large objects to/from JS. The performance of even basic things in J2V8 is causing concern for us. So, in spite of the features, we can't afford to move to Javet if it turns out to be significantly slower for some reason. Since the code paths should be similar, I didn't expect to see differences in the execution time for primitive operations either! Since we did, we naturally assumed we were doing something wrong, and it looks like we were in many cases. There persist some results that are puzzling to us though.

If there are more realistic ways to benchmark this stuff, we are certainly open to suggestions. We are trying to keep the examples minimal to ensure we are comparing apples to apples, but we might be Introducing other issues.

Thank you for your help thus far. You certainly don't owe us anything. I just thought the context would help explain why we are asking these questions.

Aug 01 '22 17:08 Taytay

Thank you for sharing the background. As I've talked to many Javet users, these points are very common and reasonable to me.

The reason that I'm not that interested in the performance test against pure JNI invocation is There's no apples to apples in this case.. Let's see some of the exciting features that Javet brings.

Eliminate the core dump.
Enable the multi-threading without synchronization.
Support arbitrary object converters.
Support injecting arbitrary Java objects.

These features don't come with zero performance overhead. Measuring the JNI invocation to some extend is measuring the two designs with some fundamental differences.

Also, my personal style is I wouldn't rush in the performance comparisons if I didn't master the target libraries because I would highly likely get misleading results affecting the decision on the adoption.

Aug 02 '22 01:08 caoccao

Those are excellent points. I'm excited to try out the fancier stuff that we aren't even allowed to do with J2V8, but I was hesitant to even examine them or get excited about it if method invocation was 4x slower than J2V8. After your guidance and pointing out our mistakes, it's nowhere near that slow, but that was why we started with some of the microbenchmarks.

If we were writing a body of JS code from scratch I'd design it around Javet's strengths. But as it stands, we have existing use patterns that we need to accommodate until we have that luxury. I might have some further questions for you about the best way to go about using the best parts of Javet to perform some equivalent operations.

Aug 02 '22 18:08 Taytay

Sure, changing an airplane engine while in flight seems to always be a headache. You are welcome sharing the pain points. Sometimes the use patterns themselves need to be refreshed to get better performance because Javet brings some new patterns.

Aug 03 '22 00:08 caoccao

Javet Javet copied to clipboard

Performance compared to J2V8?

Javet
Javet copied to clipboard