v8
v8 copied to clipboard
Batching calls within a lock / scope
While I was fiddling with bindings in a different languages, I figured what was expensive was all the locking and scoping for every call. If you can instead callback into go, holding on a lock and/or a scope, you can really speed things up.
My thinking is we can batch all value creation (for structs and objects, etc.) or all logically grouped operations within a isolate lock and/or context scope with this.
Example implementation:
type Context struct {
// ...
scopeCallbacks map[int]chan struct{}
}
// ...
var scopeEnds = make(map[int]chan struct{})
//export __go_ctx_scoped
func __go_ctx_scoped(cbIDStr C.String) {
// Identical to go_callback_handler
cbID := C.GoStringN(cbIDStr.ptr, cbIDStr.len)
parts := strings.SplitN(cbID, ":", 2)
ctxID, _ := strconv.Atoi(parts[0])
callbackID, _ := strconv.Atoi(parts[1])
contextsMutex.RLock()
ref := contexts[ctxID]
if ref == nil {
panic(fmt.Errorf(
"Missing context pointer during callback for context #%d", ctxID))
}
ctx := ref.ptr
contextsMutex.RUnlock()
ch := ctx.scopeCallbacks[callbackID]
ch <- struct{}{} // we're ready
doneCh := scopeEnds[callbackID]
<-doneCh // hold onto this scope until done!
}
func (ctx *Context) WithScope() chan struct{} {
ctx.nextCallbackId++
id := ctx.nextCallbackId
ch := make(chan struct{}, 1)
ctx.scopeCallbacks[id] = ch
cbIDStr := C.CString(fmt.Sprintf("%d:%d", ctx.id, id))
defer C.free(unsafe.Pointer(cbIDStr))
doneCh := make(chan struct{}, 1)
scopeEnds[id] = doneCh
go func() {
// Run this async, because it blocks until out of scope
addRef(ctx)
C.v8_with_ctx_scope(ctx.ptr, cbIDStr)
decRef(ctx)
}()
<-ch
// This means we're all scoped and ready
return doneCh
}
func (ctx *Context) NewObjectWScope() *Value {
return ctx.newValue(C.v8_object_new(ctx.iso.ptr), C.kObject)
}
func (ctx *Context) NewObjectWoutScope() *Value {
return ctx.newValue(C.v8_Object_New(ctx.ptr), C.kObject)
}
func (v *Value) SetWithScope(name string, value *Value) error {
name_cstr := C.CString(name)
errmsg := C.v8_object_set(v.ctx.ptr, v.ptr, name_cstr, value.ptr)
C.free(unsafe.Pointer(name_cstr))
return v.ctx.iso.convertErrorMsg(errmsg)
}
The C++ used:
void __go_ctx_scoped(String id);
void v8_with_ctx_scope(ContextPtr ctxptr, char* id) {
VALUE_SCOPE(ctxptr);
__go_ctx_scoped(DupString(id));
}
PersistentValuePtr v8_object_new(IsolatePtr iso) {
auto isolate = static_cast<v8::Isolate*>(iso);
// fprintf(stderr, "obj new iso: %p\n", isolate);
v8::Local<v8::Object> obj = v8::Object::New(isolate);
// fprintf(stderr, "obj new, done\n");
return new Value(isolate, obj);
}
PersistentValuePtr v8_Object_New(ContextPtr ctxptr) {
VALUE_SCOPE(ctxptr);
v8::Local<v8::Object> obj = v8::Object::New(isolate);
return new Value(isolate, obj);
}
Error v8_object_set(ContextPtr ctxptr, PersistentValuePtr valueptr, const char* field, PersistentValuePtr new_valueptr) {
auto isolate = static_cast<Context*>(ctxptr)->isolate;
auto ctx = static_cast<Context*>(ctxptr)->ptr.Get(isolate);
Value* value = static_cast<Value*>(valueptr);
v8::Local<v8::Value> maybeObject = value->Get(isolate);
if (!maybeObject->IsObject()) {
return DupString("Not an object");
}
// We can safely call `ToLocalChecked`, because
// we've just created the local object above.
v8::Local<v8::Object> object =
maybeObject->ToObject(ctx).ToLocalChecked();
Value* new_value = static_cast<Value*>(new_valueptr);
v8::Local<v8::Value> new_value_local = new_value->Get(isolate);
v8::Maybe<bool> res =
object->Set(ctx, v8::String::NewFromUtf8(isolate, field), new_value_local);
if (res.IsNothing()) {
return DupString("Something went wrong -- set returned nothing.");
} else if (!res.FromJust()) {
return DupString("Something went wrong -- set failed.");
}
return (Error){nullptr, 0};
}
Benchmarks:
func BenchmarkNewObjectWithoutScope(b *testing.B) {
ctx := NewIsolate().NewContext()
b.ResetTimer()
for n := 0; n < b.N; n++ {
ctx.NewObjectWoutScope()
}
}
func BenchmarkNewObjectWithScope(b *testing.B) {
ctx := NewIsolate().NewContext()
ctx.WithScope() // don't need the chan here
b.ResetTimer()
for n := 0; n < b.N; n++ {
ctx.NewObjectWScope()
}
}
Results:
$ go test -benchmem -run="^$" -bench "^BenchmarkNewObject"
goos: darwin
goarch: amd64
pkg: github.com/augustoroman/v8
BenchmarkNewObjectWithoutScope-4 1000000 7915 ns/op 32 B/op 1 allocs/op
BenchmarkNewObjectWithScope-4 2000000 972 ns/op 32 B/op 1 allocs/op
PASS
ok github.com/augustoroman/v8 13.814s
Example usage:
package main
import (
"fmt"
"github.com/augustoroman/v8"
)
func main() {
ctx := v8.NewIsolate().NewContext()
done := ctx.WithScope()
obj := ctx.NewObjectWScope()
obj.SetWithScope("hello", ctx.NewObjectWScope())
done <- struct{}{}
b, _ := obj.MarshalJSON()
fmt.Println(string(b)) // {"hello": {}}
}
edit: posted comment too quick by mistake, added code and examples.
For some reason, this doesn't work with all kinds of values... or something. I'm trying to make the whole ctx.Create use one handle scope, but started getting fun errors:
#
# Fatal error in HandleScope::HandleScope
# Entering the V8 API without proper locking in place
#
SIGILL: illegal instruction
PC=0x4a37942 m=0 sigcode=1
goroutine 0 [idle]:
runtime: unknown pc 0x4a37942
stack: frame={sp:0x7fff5fbfecf8, fp:0x0} stack=[0x7fff5fb7fa10,0x7fff5fbfee90)
00007fff5fbfebf8: 0000000006000000 00007fff5fbfed30
00007fff5fbfec08: 0000000000000000 00007fff5fbfecf0
00007fff5fbfec18: 0000000004a38010 000001da54b03389
// ...
^ This was while trying to create a Function (using a Bind function I made that doesn't acquire the scope), but I've seen it with a Date object too.
I was wrong, apparently that doesn't keep the locker or any other scope alive. It appears you don't need the locker or anything of the sort to create all kinds of values except Date and Function.
I don't think I can prevent the locker and scope destructors from firing when calling a different function. My C++ is pretty awful!
Update time: I was getting the scope errors because of switching threads.
I made a handleScope
which essentially queues functions to run in scope. The go ctx scope callback locks the OS thread and waits for "jobs" to run.
type handleScope chan func()
func (hs handleScope) Do(f func()) {
done := make(chan struct{}, 1)
hs <- func() {
f()
done <- struct{}{}
}
<-done
}
func (hs handleScope) End() {
close(hs)
}
Example usage:
func (ctx *Context) Create(val interface{}) (*Value, error) {
scope := ctx.WithScope()
var v *Value
var err error
scope.Do(func() {
v, _, err = ctx.create(reflect.ValueOf(val))
})
scope.End()
return v, err
}
This, is not super fast for 1 value. Hell, it's slower and allocates a bit more.
However, for a few values, it makes things faster.
func BenchmarkNewObjectWithoutScope(b *testing.B) {
ctx := NewIsolate().NewContext()
b.ResetTimer()
for n := 0; n < b.N; n++ {
ctx.NewObjectWoutScope()
ctx.NewObjectWoutScope()
ctx.NewObjectWoutScope()
ctx.NewObjectWoutScope()
ctx.NewObjectWoutScope()
ctx.NewObjectWoutScope()
}
}
func BenchmarkNewObjectWithScope(b *testing.B) {
ctx := NewIsolate().NewContext()
scope := ctx.WithScope() // don't need the chan here
todo := func() {
ctx.NewObjectWScope()
ctx.NewObjectWScope()
ctx.NewObjectWScope()
ctx.NewObjectWScope()
ctx.NewObjectWScope()
ctx.NewObjectWScope()
}
b.ResetTimer()
for n := 0; n < b.N; n++ {
scope.Do(todo)
}
}
$ go test -benchtime=10s -benchmem -run="^$" -bench "^BenchmarkNewObject"
goos: darwin
goarch: amd64
pkg: github.com/augustoroman/v8
BenchmarkNewObjectWithoutScope-4 1000000 65705 ns/op 192 B/op 6 allocs/op
BenchmarkNewObjectWithScope-4 1000000 20600 ns/op 320 B/op 8 allocs/op
PASS
ok github.com/augustoroman/v8 89.272s
There is a lot more lock contention with the scoped version. There's a for range on a channel that is single-threaded waiting for data.
For my use cases, this is pretty interesting. I have complex objects to pass and as soon as there are 2-3 sub-values needing creation, it can get pretty slow.
I can push to a branch if you're interested in seeing what's happening.
Ok, I woke up realizing that benchmark was unfair. I was not getting and putting the scope back during the benchmark, which the other one does.
With that in place, ns/op is very similar, but total ops is almost 10x lower with the batch mode. Which surprises me a lot! It is technically a few more cgo calls, but all in all, 2x less time is spent in cgo calls with the batch mode. It's spent elsewhere though, usually in runtime.
namespaced ops.
It's a bit hard to know which one is better in this very specific benchmark. I would think it'd be faster to prevent lock and scope changing threads all the time.