gop
gop copied to clipboard
numgo+ v.s. numpy
From my experience with PaddlePaddle and ElasticDL, I believe it's necessary to have a new programming language that can replace Python and Go+ is on the right track.
In hope of helping to establish a community, I am trying to make the following program working so we could have something like numgo+ that mimics Python+numpy
.
import (
"fmt"
"strings"
"gonum.org/v1/gonum/mat"
)
func NewMat(x [][]float64) *mat.Dense {
if len(x) <= 0 {
log.Fatalf("NewMat expects a 2D float64 non-empty slice. However, len(x)=%d", len(x))
}
return mat.NewDense(len(x), len(x[0]))
}
func EmptyMat(h, w int) *mat.Dense {
return mat.NewDense(h, w, nil)
}
m := [[1,2],
[3,4],
[5,6]] // [][]int
a := NewMat(m)
b := NewMat(m)
c := EmptyMat(a.Dims())
c.Mul(a, b)
Currently, qrun .
panics with toExternalType: todo
. It seems due to the incompleteness of Go+.
https://github.com/qiniu/goplus/blob/f820876458b4ff3fb6aee46df17e0e2a1890e337/cl/type_decl.go#L228-L230
I will try to contribute to make this work. If anyone else could run faster than me, it would be highly appreciated.
Cool.
I suggest you splitting your work into small enhancements so that you can make pull request frequently. And this will make our cooperation smoothly.
I suggest you splitting your work into small enhancements so that you can make pull request frequently.
Sure, I will.
I am still learning your source code, trying to understand the typing thing. Once I am ready, I will file a design PR for your review before coding. With the design confirmed, I will file a sequence of small PRs to change the source code.
Proposal: numgo+
and GoTorch
Basing on Go+
Go+ simplifies Go syntax in a way that is good for data science. To prosper the
idea and make it into a society, I propose to found new projects numgo+
and
gotorch
just like numpy
and PyTorch built upon Python.
Python-based stack | Go+-based stack |
---|---|
PyTorch | GoTorch |
numpy | numgo+ |
Python | Go+ |
According to my experience as a former leader of Baidu PaddlePaddle and a senior staff data scientist at LinkedIn, I personally would prefer the Go+-based tech stack for reasons:
- Go+ is compiled and implies higher runtime efficiency.
- Go+, like Go, is strongly and statically typed, which means less error-prone at runtime.
- Go+, inherits the syntax from Go, is strict, which means usually there is only one way to implementing an idea -- comparing to Python that you can write a dozens of different code snippets all implementing the same idea. Such strict syntax makes codebases tolerance to a bunch of contributors who have various background, mindsets, and skill levels.
numgo+
This document focuses on numgo+
, which, like numpy
provides the basic data
type for PyTorch, could be the basis of the proposed GoTorch.
At the heart of numpy
is a data type ndarray
that encapsulate a tensor. I
propose to have numgoplus.ndarray
as a counterpart which has compatible API to
ease the migration from numpy
to numgo+
.
Array Creation
The following example comes from the official numpy
tutorial. The
proposed numgo+
counterpart is to the right.
|
|
Array Literals
In Go, we write array/slice literals with type explicitly.
a := [][]float64{
{1.0, 2.0, 3.0},
{1.0, 2.0, 3.0}}
Go+ can automatically derive the type from the element literals, thus enables a much easier way that looks like Python.
a := [[1.0, 2.0, 3.0],
[1.0, 2.0, 3.0]]
This enables numgo+
an API of literal arrays like numpy
.
|
|
I came up with same idea and when you post this, I was having my breakfast.
github.com/goplus/numgoplus => github.com/numgoplus/ng
In Go+ we will support a feature named auto property. It means:
import ng "github.com/goplus/numgoplus"
a := ng.arange(15).reshape(3, 5)
fmt.Println(a)
// array([[ 0, 1, 2, 3, 4],
// [ 5, 6, 7, 8, 9],
// [10, 11, 12, 13, 14]])
fmt.Println(a.Shape()) // (3, 5)
fmt.Println(a.Ndim()) // 2
fmt.Println(a.Dtype().Name()) // 'int64'
fmt.Println(a.Itemsize()) // 8
fmt.Println(a.Size()) // 15
fmt.Println(reflect.TypeOf(a)) // numgoplus.Ndarray
b = ng.array([6, 7, 8])
fmt.Println(b)
// array([6, 7, 8])
fmt.Println(reflect.TypeOf(b)) // numgoplus.Ndarray
can be:
import "github.com/numgoplus/ng"
a := ng.arange(15).reshape(3, 5)
println(a)
// array([[ 0, 1, 2, 3, 4],
// [ 5, 6, 7, 8, 9],
// [10, 11, 12, 13, 14]])
println(a.shape) // (3, 5)
println(a.ndim) // 2
println(a.dtype.name) // 'int64'
println(a.itemsize) // 8
println(a.size) // 15
println(reflect.typeOf(a)) // ng.Ndarray
b = ng.array([6, 7, 8])
println(b)
// array([6, 7, 8])
println(reflect.typeOf(b)) // ng.Ndarray
In Go+ we have simplified form of 2d vector. It means:
b := ng.array([[1.5,2,3], [4,5,6]])
// array([[1.5, 2. , 3. ],
// [4. , 5. , 6. ]])
can be:
b := ng.array([1.5,2,3; 4,5,6])
// array([[1.5, 2. , 3. ],
// [4. , 5. , 6. ]])
The simplified form of 2D vectors looks a step further than Python. It is close to MATLAB syntax. Great idea!
Here is a typical PyTorch program in four different languages:
- The Python version comes from the official tutorial.
- The C++ version calls the ATen C library and Torch's
csrc
C++ library. Thanks to Jia-Kai Liu, a tech lead of PyTorch, for teaching me everything about the C/C++ core of PyTorch. Please follow instructions in https://github.com/wangkuiyi/cxxtorch to run this program. - The Go version calls imaginary Go binding of ATen and
csrc
. - The Go+ version is also imaginary.
C++ | Go |
|
|
Go+ | Python |
|
|
From the above four programs, we can see
- The Go binding could be as effective as the C/C++ API, in terms of the number of lines of source code.
- If we want the Go+ version as short/concise as the Python version, the primary requirement to the Go+ transpiler is to support named function parameters. For example,
x := torch.RandN(N, Din, requires_grad=False)
.
In Go+, we can write as the following:
package main
import (
"fmt"
"github.com/gotorch/gotorch/at"
"github.com/gotorch/gotorch/torch"
"github.com/gotorch/gotorch/torch/optim"
)
N, Din, H, Dout := 64, 1000, 100, 10
x := torch.RandN(N, Din, {})
y := torch.RandN(N, Dout, {})
w1 := torch.RandN(Din, H, {RequiresGrad: true})
w2 := torch.RandN(H, Dout, {RequiresGrad: true})
learningRate := 1e-3
adam := optim.NewAdam([w1, w2], {LR: learningRate})
for i := 0; i < 500; i++ {
yPred := at.Sum(at.Clamp(at.MM(x, w1), 0), w2)
loss := at.Sum(at.Pow(at.Sub(yPred, y), 2))
if i%100 == 0 {
fmt.Println("loss = ", loss)
}
adam.ZeroGrad()
loss.Backward()
adam.Step()
}
New language features:
- https://github.com/goplus/gop/issues/486: Struct type of function calling parameter can be automatically deduced.
Here is a typical PyTorch program in four different languages:
- The Python version comes from the official tutorial.
- The C++ version calls the ATen C library and Torch's
csrc
C++ library. Thanks to Jia-Kai Liu, a tech lead of PyTorch, for teaching me everything about the C/C++ core of PyTorch. Please follow instructions in https://github.com/wangkuiyi/cxxtorch to run this program. From the above four programs, we can see
- The Go binding could be as effective as the C/C++ API, in terms of the number of lines of source code.
- If we want the Go+ version as short/concise as the Python version, the primary requirement to the Go+ transpiler is to support named function parameters. For example,
x := torch.RandN(N, Din, requires_grad=False)
.
There's still a problem:libtorch
uses exceptions as the main error handling mechanism, this causes two consequences:
- We have to find a way to pass C++ exception to Go, as described in https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/
- Can Go+ provide a more efficient way to to the same thing?
- Take error handling into consideration, in terms of the number of lines of source code, the Go binding maybe not as effective as the C/C++ API because we have to check the error status of each line.
- Go+ has already provided a neat error handling syntax, how can we leverage the Go+ mechanism to simplify error handling of the Go binding?
There's still a problem:
libtorch
uses exceptions as the main error handling mechanism, this causes two consequences:
We have to find a way to pass C++ exception to Go, as described in https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/
- Can Go+ provide a more efficient way to to the same thing?
Take error handling into consideration, in terms of the number of lines of source code, the Go binding maybe not as effective as the C/C++ API because we have to check the error status of each line.
- Go+ has already provided a neat error handling syntax, how can we leverage the Go+ mechanism to simplify error handling of the Go binding?
Define error type: CppError
type CppError struct {
what string
}
func (p *CppError) Error() string {
return p.what
}
func NewCppError(what string) error {
return &CppError{what: what}
}
Wrap functions with C++ exception
/*
OutputArgs XXX_Wrap(InputArgs input, pwhat *GoString) {
try {
return XXX(input);
} catch(std::exception &e) {
*pwhat = C.GoString(e.what());
return OutputArgs();
}
}
*/
import "C"
func XXX(input InputArgs) (output OutputArgs, err error) {
var what string
output = C.XXX_Wrap(input, &what)
if what != "" {
err = NewCppError(what)
}
return
}