gopherjs icon indicating copy to clipboard operation
gopherjs copied to clipboard

GopherJS generics support

Open nevkontakte opened this issue 4 years ago • 25 comments

It appears that generic (a.k.a. type parameters support) is well underway in the upstream Go (dev.typeparams branch) and is likely to be included in 1.18. Which means GopherJS also needs to support this, ideally at the same time as Go 1.18, which I expect is a lot of work. Does anyone have even approximate sense of what it would take to support generics? @flimzy @dmitshur @neelance.

For now, I'm filing this issue to look for a volunteer (or several) to contribute this support. By myself I almost certainly won't be able to implement this, considering that we have several other important features from earlier releases to take care of, such as modules and embed.

Important TODOs:

  • [ ] Re-enable encoding/xml tests once generics are supported.
  • [ ] Re-enable checkStringParseRoundTrip() in net/netip/fuzz_test.go.
  • [ ] Remove compareSlices() override in go/doc/doc_test.go.
  • [ ] Re-enable TestIssue50208 in reflect package.
  • [ ] Verify pointers to arrays are wrapped correctly in generic functions.
  • [ ] [go1.19] Remove generics-related overrides in crypto/elliptic/nistec.go, crypto/internal/boring/bbig/big.go, crypto/internal/nistec/nistec_test.go, crypto/internal/nistec/wrapper.go, go/token/position.go, sync/atomic/atomic.go, sync/atomic/atomic_test.go, testing/helper_test.go, testing/helperfuncs_test.go
  • [ ] [go1.20] Remove generics-related overrides in compiler/natives/src/sync/map.go, compiler/natives/src/time/format.go, compiler/natives/src/time/format_rfc3339.go, compiler/natives/src/encoding/gob/gob.go, compiler/natives/src/internal/godebug/godebug.go, compiler/natives/src/net/http/http.go, compiler/natives/src/time/export_test.go

nevkontakte avatar Apr 05 '21 15:04 nevkontakte

I think it's planned for go1.18.

inliquid avatar Apr 09 '21 13:04 inliquid

I've been doing some research on potential ways to support generics in gopherjs and I think it would be good for me to share my thoughts. This isn't a proper design document, but I hope it would help us start a discussion and perhaps make it easier for people to contribute towards this :)

I'll start by sharing links to a few important resources I used:

These design docs deserve the most attention here, as they describe options the Go team considered for their own implementation. The first two docs present two ends of the spectrum: generate a specialized copy of executable code for each instantiation and generate one copy of universal code, delegating type-specific logic to helper functions passed via the dictionary. Ultimately, the Go team went with the hybrid approach as a trade off between binary size, compile time and complexity.

I think for GopherJS the tradeoff looks significantly differently. For us, output size is a critical concern, but things like stack allocation or memory management are much less relevant, since we delegate those to the javascript runtime. Also on a technical level, stenciling requires the ability to add function instantiations to the imported packages and deduplicate them at the linking stage. This is solvable, but GopherJS is not very well set up for this and will require a significant refactoring to make it possible. This makes me strongly favor the dictionaly-based design.

However, we can take advantage of our dynamic runtime environment and simplify our implementation even further by delaying generics instantiation to the runtime. That way, the compiler won't have to generate dictionaries and subdictionaries from the original design, or include them into the generated binary.

Consider this non-generic type:

type Foo struct{ f string }
func (f Foo) Bar() { println("bar") }

Here is a slightly abbreviated version of the code it compiles into:

Foo = $newType(0, $kindStruct, "main.Foo", true, "main", true, function(f_) { /* field initialization code */ });
Foo.ptr.prototype.Bar = function() { /* Foo.Bar() code */ };
Foo.methods = [{prop: "Bar", name: "Bar", pkg: "", typ: $funcType([], [], false)}];
Foo.init("main", [{prop: "f", name: "f", embedded: false, exported: false, typ: $String, tag: ""}]);

main = function() {
  var f = new Foo.ptr(""); // Using the type!
  $clone(f, Foo).Bar();
};

The important insight here is that we are already delaying type instantiation to the runtime, just before we begin program execution. Why not do the same for generic types, except do this when the generic type is demanded by the program?

Consider the following generic type:

type Foo[T any] struct{ f T }

func (f Foo[T]) Bar() { println("bar") }

Instead of a concrete instantiation we could generate a factory function for it:

Foo = function(T) {
  var instance = $newType(0, $kindStruct, "main.Foo[" + T.typeString +"]", true, "main", true, function(f_) { /* field initialization code */ });
  // Note that the method body closes over the type T and can use it for reflection/whatever purposes.
  instance.ptr.prototype.Bar = function() { /* Foo.Bar() code */ };
  instance.methods = [{prop: "Bar", name: "Bar", pkg: "", typ: $funcType([], [], false)}];
  // Note "typ: T" on the next line — using type parameter!
  instance.init("main", [{prop: "f", name: "f", embedded: false, exported: false, typ: T, tag: ""}]);
  return instance;
}

main = function() {
  var f = new (Foo($String).ptr)(""); // Calling the factory to get the constructor for the new operator.
  $clone(f, Foo($String)).Bar();
};

Of course, this is a simplified example. In the real implementation the factory function will cache and reuse instances of a generic type to improve performance and avoid "same but different" kinds of type conflicts. Similar approach could be applied to standalone generic functions.

Another issue we need to address is type-specific operations. The same operator may mean different things for different types, for example:

  • int32(1) + int32(2) compiles into 1 + 2 >> 0
  • "one" + "two" compiles into "one" + "two"

The original design solves this by introducing dictionaries that have a virtual table of functions that can be used with the generic type. In our implementation we can use the type instance itself in that role. For example we could define $String.add() and $Int32.add(), which could be called in the generic code as T.add(). Admittedly, this is likely to be worse performance that using the operator directly, but we could modify the compiler to only use these functions in the generic code, thus preventing performance hit in the existing non-generic code.

Finally, we also need to be able to construct types in generic code from passed type parameters, for example:

func Baz[T any]() { 
  x = Foo[[]T]{}
  //...
}

That could be compiled into something like this:

Baz = function(T) {
  return function() {
    x = new (Foo($silceType(T)))();
    // ...
  }
}

So that's as far as I got. It's getting late here, so apologies for all the typos I'm sure I've made in this text 😅 Inviting @flimzy, @paralin and any one else interested to comment.

nevkontakte avatar Aug 16 '22 22:08 nevkontakte

I created the new generics branch for this work and enabled the typeparam test suite from the Go repository in it. This should give us a decent idea for the level of compatibility we currently have.

@paralin are you still planning to work on it? I think I might start chipping away on this issue, and it would be a shame if we did duplicate work :) That said, I think there is plenty enough work for two people, if you are willing to help.

nevkontakte avatar Sep 18 '22 16:09 nevkontakte

@nevkontakte I haven't started looking into this yet, but I'm happy to help out where I can if you have something you need done.

paralin avatar Sep 18 '22 16:09 paralin

Since I haven't really started coding yet, I can't point at anything specific 😅 My general plan is to look at CI, pick a failing test case and try to make it pass, following the general approach I posted in the earlier comment. I might also use you as a reviewer for my own changes, hopefully that will give you ideas what to do next.

nevkontakte avatar Sep 18 '22 16:09 nevkontakte

@nevkontakte Sounds good! I haven't had any time to look at this yet

paralin avatar Sep 21 '22 23:09 paralin

I am now beginning to second-guess the wisdom of the approach I've described above...

On one hand it does the benefit of the smallest possible output size, which is important. On the other hand, it will make generics slower than normal function calls in numerous fundamental ways. Here is an list of performance issues I have discovered so far (there probably are more):

  • Calling a generic function will be two calls: first to a the generic function factory, then to the function itself.
  • When applied to generic types, operators like + will have to become virtual function calls because different primitive types implement them differently.
  • "Wrapped" types (e.g. types that gopherjs represents as JS native types at runtime, only wrapping in Go-specific object when necessary, such as type x int) will have to be wrapped in the function prologue, which is a memory allocation.
  • Each method call for a typeparameter-dependent object would have to be considered as a potentially blocking, with the corresponding overhead (like it is with interface method calls).

That said, going full monomorphization is also not easy from the compiler architecture perspective, and will cause obvious output size blowups.

nevkontakte avatar Nov 02 '22 17:11 nevkontakte

@nevkontakte I think it's worthwhile to consider that most users will end up running the code through a js optimizer at some point. So potentially two things: maybe code size isn't as important? Or possibly, are the optimizers capable of reducing the multiple calls to fewer, so it doesn't matter? Not sure which is true

paralin avatar Nov 02 '22 19:11 paralin

@paralin in our user survey output file size is by far the biggest pain point: image

So it's hard for me to argue that it isn't that important.

In general it's kind of hard to imagine how a generic JS optimizer can help with deduplicating monomorphized function bodies, since the will have to be semantically different from each other, even if similar. For multiple function calls my best bet is on the JIT compiler JS engines may have, but those work in mysterious ways and it's hard to tell would would be inlined away and what wouldn't.

A part of me thinks that it all would be way easier if GopherJS used SSA representation to generate the code, but that's a huge undertaking in itself, and there is a non-trivial chance that SSA would end up unsuitable in the end (particularly, transpiling SSA into a high-level language might lead to a more verbose and less readable JS code than we have today).

nevkontakte avatar Nov 02 '22 20:11 nevkontakte

@nevkontakte I see what you're saying. Yeah, that was what I was wondering - how much do these JS optimizers actually restructure the code to improve performance... Is it like GCC where there are structural optimizations going on, or is it just a simple uglify type pass?

From some research I guess today the "optimizers" of the JS world are very much in an early stage only: https://prepack.io/

So indeed it's important to optimize for these things at the code-gen time and not lean on post-processing.

Apologies for going down this side-road in this discussion!

paralin avatar Nov 02 '22 21:11 paralin

@paralin no worries, that's a reasonable question to ask :) I think Google Closure Compiler is one of the most advanced JS optimizers around, but I don't think many people use it. And it also relies heavily on type annotations (which we don't emit) to do more advanced optimizations like dead code elimination. Even with them, I don't think it can do anything like deduplication or inlining.

JavaScript's dynamic nature makes it really difficult for static analysis in general, so I don't expect to find a silver bullet at that level :(

That said, if you are interested in giving it a shot, you could work on a monomorphization-style generics support, while I continue of the current track. Once both implementations are reasonably functional we could do some comparisons and decide which one to polish and merge.

nevkontakte avatar Nov 02 '22 21:11 nevkontakte

@nevkontakte Would love to contribute that - at the moment am swamped with tasks on other projects (as usual) but hopefully someday I will have more time to work on gopherjs.

paralin avatar Nov 02 '22 21:11 paralin

"Wrapped" types are going to be one of the bigger challenges...

Background:

Normal, non-wrapped types are represented at runtime by JS objects created by gopherjs-generated constructor functions. For example:

type mySlice []int
...
x := mySlice{1}

...compiles into something like this:

mySlice = $pkg.mySlice = $newType(12, $kindSlice, "main.mySlice", true, "main", false, null);
x = new mySlice([1]);

Importantly, the created object would follow gopherjs layout rules and carry all the type metadata gopherjs expects it to have. If this type has methods, they will be in the prototype, callable as x.foo().

Now, consider the following type:

type num int
func (n num) foo() { println("num:", n) }
...
n := num(1)

In the end of the day, this is just an integer. It would be rather inefficient to allocate a whole object for it every time, especially if we consider computing complex arithmetic expressions or arrays. Enter "wrapped" types. Instead of representing them as custom types, gopherjs represents them as native js types, in this case number:

num = $pkg.num = $newType(4, $kindInt, "main.num", true, "main", false, null);
num.prototype.foo = function() {
	var n;
	n = this.$val;
	console.log("num:", n);
};
...
n = 1; // No type constructor call!

"Wrapping" is deferred until the later point, when we want to call a method: n.foo() compiles into new num(n).foo();.

A similar thing happens if we convert a wrapped type into an interface and back:

Go:

var a any = n
n2 := a.(num)

JS:

a = new num(n);
n2 = $assertType(a, num);

("Fun" aside: you may notice that gopherjs doesn't represent interfaces at runtime at all. Why? IDK.)

The problem

Wrapped and non-wrapped types present radically different runtime contracts. Operations like a method call or conversion to an interface are trivial for non-wrapped types, but they require type constructor call for the wrapped ones. Yet we have to generate one function body that must be able to handle both. Consider the example:

// //go:build ignore

package main

type iface interface {
	foo()
}

type num int

func (n num) foo() { println("num:", n) }

type st struct{ v int }

func (s st) foo() { println("st:", s.v) }

func generic[T iface](a T, factory func(T) T) {
	var b T
	c := factory(a)
	println(a, b, c)
	a.foo()
}

func main() {
	stFactory := func(x st) st { return st{1} }
	s := st{1}
	generic(s, stFactory)

	numFactory := func(x num) num { return 1 }
	n := num(1)
	generic(n, numFactory)
}

The generic() function compiles into this:

generic = function(T){
  return function(a, factory) {
    var {_r, a, b, c, factory, $s, $r, $c} = $restore(this, {a, factory});
    /* */ $s = $s || 0; s: while (true) { switch ($s) { case 0:
    b = T.zero();
    _r = factory(a); /* */ $s = 1; case 1: if($c) { $c = false; _r = _r.$blk(); } if (_r && _r.$blk !== undefined) { break s; }
    c = _r;
    console.log(a, b, c);
    $r = a.foo(); /* */ $s = 2; case 2: if($c) { $c = false; $r = $r.$blk(); } if ($r && $r.$blk !== undefined) { break s; }
    $s = -1; return;
    /* */ } return; } var $f = {$blk: generic, $c: true, $r, _r, a, b, c, factory, $s};return $f;
  };
};

Notice that a.foo() method call is done as if T was a non-wrapped type without extra constructor calls. Naturally, this program fails for the num type:

{ '$val': [Circular], v: 1 } { '$val': [Circular], v: 0 } { '$val': [Circular], v: 1 }
st: 1
1 0 1
TypeError: a.foo is not a function
    at wrapped.go:21:3
    at main (wrapped.go:31:3)
    at $init (wrapped.go.874275466:2668:9)
    at $goroutine (wrapped.go.874275466:1682:19)
    at $runScheduled (wrapped.go.874275466:1728:7)
    at $schedule (wrapped.go.874275466:1752:5)
    at $go (wrapped.go.874275466:1714:3)
    at Object.<anonymous> (wrapped.go.874275466:2680:1)
    at Object.<anonymous> (wrapped.go.874275466:2683:4)
    at Module._compile (internal/modules/cjs/loader.js:999:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)
    at internal/main/run_main_module.js:17:47

I am not entirely sure how to address this. At first I considered force-wrapping all wrapped types when they are passed into a generic function, but then I realized there are too many ways a wrapped type can get passed in. It would also have to be "unwrapped" whenever it is passed into any other function, so getting this right is nearly impossible.

My current idea is to change how method calls work by replacing a.foo() in the example above with something like T.foo.call(a), though I'm sure it'll uncover other edge cases.

So I'm posting this braindump in case someone else has an idea what we could do here.

nevkontakte avatar Nov 02 '22 22:11 nevkontakte

Another gotcha I've discovered (although I don't have time to document it in as much detail as it deserves) is that the object representing a struct is also treated as a pointer to that struct. That also doesn't mesh well with generic code, which doesn't know that we are dealing with a struct and would create a pointer in the normal way, thus violating expectations of the non-generic code. Sigh.

nevkontakte avatar Nov 02 '22 22:11 nevkontakte

I keep discovering fascinating details about GopherJS's type representation. Above I mentioned that a struct and a pointer to a struct are the same thing. Well, this isn't quite true.

Technically, there is no struct value representation, only a pointer to a struct. This is how a new struct type is defined in JS:

https://github.com/gopherjs/gopherjs/blob/3726a5e828fefec1bd98d4cab84220caf961fb43/compiler/prelude/types.go#L240-L246

Notice that the constructor function is passed to the typ.ptr = $newType(4, $kindPtr, ..., constructor);. Indeed, s := st{1} (where st is a struct type) compiles into s = new st.ptr(1);. But then it continues to act as it of was a value type, employing occasional $clone calls to work around JS's reference semantics. Intuitively you'd expect that the struct type is a JS object and a pointer to a struct is something lightweight, but in reality the pointer is the JS object, and struct value is not really a thing. Or more accurately, the struct value is a wrapped type, which wraps a pointer to the struct ¯\_(ツ)_/¯, but the rest of the time is represented by the pointer.

I suppose the idea behind this is that pass-by-value semantics in JS is really a lot like a pointer, but I wish this was documented.

nevkontakte avatar Nov 03 '22 13:11 nevkontakte

I think I found a solution to the wrapped types problem: https://github.com/nevkontakte/gopherjs/commit/776d188ca55625452195d979a161e327bc902d34. I am not sure it covers 100% of edge cases, but it seems to hold up for the ones I could think of. I also hope that JIT would be able to inline T.wrap() calls well.

nevkontakte avatar Nov 03 '22 16:11 nevkontakte

Heads-up to anyone following this issue, I have rebased the generics branch on top of the latest mater and force-pushed it into the repository. If you have it checked out, beware.

nevkontakte avatar Dec 04 '22 17:12 nevkontakte

@nevkontakte If I wanted to try this out in my project, how would I build the gopherjs binary from your fork? I get:

❯ go install github.com/nevkontakte/gopherjs@d5b4c07
go: downloading github.com/nevkontakte/gopherjs v1.17.3-0.20230127213519-d5b4c0738978
go: github.com/nevkontakte/gopherjs@d5b4c07: github.com/nevkontakte/[email protected]: parsing go.mod:
	module declares its path as: github.com/gopherjs/gopherjs
	        but was required as: github.com/nevkontakte/gopherjs

For the project-level dependency, I was able to use a replace directive, but there doesn't seem to be an equivalent for GOPATH-level binaries. I can clone and go build . manually, but is there a nicer way?

Apologies for derailing this conversation. It's been a while since I've done any Go development so I'm out of the loop.

timdp avatar Feb 08 '23 10:02 timdp

@timdp you don't need to use my fork, everything that's in a semi-working condition is merged into the generics branch in this repo. So go install github.com/gopherjs/gopherjs@generics should work.

However, be aware that the implementation is very incomplete. Most notably, operators won't work with generic variable types, so typical use cases for generics like a min[T constraints.Integer](a, b T) won't work. And the part that's implemented has known bugs. So feel free to play around, but I wouldn't recommend to use it in production just yet :)

nevkontakte avatar Feb 08 '23 19:02 nevkontakte

I see, thanks! It did seem a bit early, but you never know. 😁

In case you're interested in my use case:

Details

I'm trying to compile the Prometheus PromQL parser to JS.

Sample code:

package main

import (
	"github.com/gopherjs/gopherjs/js"
	"github.com/prometheus/prometheus/promql/parser"
)

func main() {
	js.Global.Set("foo", js.MakeWrapper(func(query string) (string, error) {
		expr, err := parser.ParseExpr(query)
		if err != nil {
			return "", err
		}
		return expr.String(), nil
	}))
}

With v1.18.0-beta2, this fails as expected with:

panic: interface conversion: ast.Expr is *ast.IndexExpr, not *ast.Ident

With generics, I'm currently getting:

[compiler panic]  interface conversion: types.Type is *types.Interface, not *types.Basic

Occurred while compiling statement at /Users/timdp/go/pkg/mod/github.com/prometheus/[email protected]/model/histogram/generic.go:137:3:
  if currentBucketAbsolute == 0 {
    nothingToDo = false
    break
  }

timdp avatar Feb 09 '23 11:02 timdp

@timdp yep, that's an error I'd expect to happen until operator support is implemented. That's the major missing piece right now.

If you are in a hurry and wasm is acceptable for you, I think TinyGo supports generics already, you could try that.

nevkontakte avatar Feb 11 '23 13:02 nevkontakte

Unfortunately, that one fails because it doesn't have runtime/metrics. But we're on the same page. 😁

timdp avatar Feb 11 '23 14:02 timdp

I guess now would be a good time for a small confession... The longer I've been working on a generics implementation I described here, the less happy I was with it.

There were two and a half major problems with it:

  • The biggest one is that generics became a gigantic special case in the compiler. A lot of type-specific stuff that the compiler would normally be able to handle for non-generic code had to be replicated in the runtime for generics. The half-reason is that this meant prelude was getting bigger and more complex -> the output size grew, even if generics were not be used.
  • Generics would would be unavoidably slower than non-generic one, which is both sad and counter-intuitive. To illustrate why, consider integer addition: if we know we are working with uint8 we can generate something like (x+y)%256 , which is quite cheap; however, if we are working with generics, it would have to be T.$add(x, y), where T is a runtime reference to uint8 type. The latter is considerably more expensive because it now involves a dynamic method call. We can hope that JIT will eventually inline it, but that's not a given.

So, for the last couple of months I've been working on an alternate approach, which is now https://github.com/gopherjs/gopherjs/pull/1272. In the good news, it's passing nearly all generic-related test cases from the Go repo. However, there is a bit more work before we can merge that implementation. So, stay tuned.

nevkontakte avatar Feb 25 '24 01:02 nevkontakte

@flimzy how do you feel about merging the generics-ng branch into master in its current state, gating generics support behind some sort of GOPHERJS_EXPERIMENT flag? Generics are not production-ready yet, but I think it's very unlikely we'd take any major direction changes in the implementation. I am currently looking into wiring cross-package generic instances and it will likely require some non-trivial refactoring on the build package and tool.go side to do it cleanly. Doing that refactoring in an isolated branch is likely to turn into a merge conflict headache, especially with the Go 1.20 work @grantnelson-wf is doing. Moving most of the generic development into master would make it easier to keep 1.20 work up to date with it.

nevkontakte avatar Mar 16 '24 16:03 nevkontakte

Sounds reasonable to merge given the rationale above. +1

flimzy avatar Mar 16 '24 19:03 flimzy