gop icon indicating copy to clipboard operation
gop copied to clipboard

numgo+ v.s. numpy

Open wangkuiyi opened this issue 4 years ago • 13 comments

From my experience with PaddlePaddle and ElasticDL, I believe it's necessary to have a new programming language that can replace Python and Go+ is on the right track.

In hope of helping to establish a community, I am trying to make the following program working so we could have something like numgo+ that mimics Python+numpy.

import (
    "fmt"
    "strings"
    "gonum.org/v1/gonum/mat"
)

func NewMat(x [][]float64) *mat.Dense {
    if len(x) <= 0 {
        log.Fatalf("NewMat expects a 2D float64 non-empty slice. However, len(x)=%d", len(x))
    }
    return mat.NewDense(len(x), len(x[0]))
}

func EmptyMat(h, w int) *mat.Dense {
    return mat.NewDense(h, w, nil)
}

m := [[1,2],
      [3,4],
      [5,6]]  // [][]int

a := NewMat(m)
b := NewMat(m)
c := EmptyMat(a.Dims())
c.Mul(a, b)

Currently, qrun . panics with toExternalType: todo. It seems due to the incompleteness of Go+.

https://github.com/qiniu/goplus/blob/f820876458b4ff3fb6aee46df17e0e2a1890e337/cl/type_decl.go#L228-L230

I will try to contribute to make this work. If anyone else could run faster than me, it would be highly appreciated.

wangkuiyi avatar Jun 21 '20 17:06 wangkuiyi

Cool.

xushiwei avatar Jun 21 '20 18:06 xushiwei

I suggest you splitting your work into small enhancements so that you can make pull request frequently. And this will make our cooperation smoothly.

xushiwei avatar Jun 21 '20 20:06 xushiwei

I suggest you splitting your work into small enhancements so that you can make pull request frequently.

Sure, I will.

I am still learning your source code, trying to understand the typing thing. Once I am ready, I will file a design PR for your review before coding. With the design confirmed, I will file a sequence of small PRs to change the source code.

wangkuiyi avatar Jun 21 '20 23:06 wangkuiyi

Proposal: numgo+ and GoTorch Basing on Go+

Go+ simplifies Go syntax in a way that is good for data science. To prosper the idea and make it into a society, I propose to found new projects numgo+ and gotorch just like numpy and PyTorch built upon Python.

Python-based stack Go+-based stack
PyTorch GoTorch
numpy numgo+
Python Go+

According to my experience as a former leader of Baidu PaddlePaddle and a senior staff data scientist at LinkedIn, I personally would prefer the Go+-based tech stack for reasons:

  • Go+ is compiled and implies higher runtime efficiency.
  • Go+, like Go, is strongly and statically typed, which means less error-prone at runtime.
  • Go+, inherits the syntax from Go, is strict, which means usually there is only one way to implementing an idea -- comparing to Python that you can write a dozens of different code snippets all implementing the same idea. Such strict syntax makes codebases tolerance to a bunch of contributors who have various background, mindsets, and skill levels.

numgo+

This document focuses on numgo+, which, like numpy provides the basic data type for PyTorch, could be the basis of the proposed GoTorch.

At the heart of numpy is a data type ndarray that encapsulate a tensor. I propose to have numgoplus.ndarray as a counterpart which has compatible API to ease the migration from numpy to numgo+.

Array Creation

The following example comes from the official numpy tutorial. The proposed numgo+ counterpart is to the right.

import numpy as np
a = np.arange(15).reshape(3, 5)
print(a)
# array([[ 0,  1,  2,  3,  4],
#        [ 5,  6,  7,  8,  9],
#        [10, 11, 12, 13, 14]])
print(a.shape) # (3, 5)
print(a.ndim)  # 2
print(a.dtype.name) # 'int64'
print(a.itemsize) # 8
print(a.size) # 15
print(type(a)) # <class 'numpy.ndarray'>
b = np.array([6, 7, 8])
print(b)
# array([6, 7, 8])
print(type(b)) # <class 'numpy.ndarray'>
import ng "github.com/goplus/numgoplus"
a := ng.arange(15).reshape(3, 5)
fmt.Println(a)
// array([[ 0,  1,  2,  3,  4],
//        [ 5,  6,  7,  8,  9],
//        [10, 11, 12, 13, 14]])
fmt.Println(a.Shape()) // (3, 5)
fmt.Println(a.Ndim())  // 2
fmt.Println(a.Dtype().Name()) // 'int64'
fmt.Println(a.Itemsize()) // 8
fmt.Println(a.Size()) // 15
fmt.Println(reflect.TypeOf(a)) // numgoplus.ndarray
b = ng.array([6, 7, 8])
fmt.Println(b)
// array([6, 7, 8])
fmt.Println(reflect.TypeOf(b)) // numgoplus.ndarray

Array Literals

In Go, we write array/slice literals with type explicitly.

a := [][]float64{
	{1.0, 2.0, 3.0},
	{1.0, 2.0, 3.0}}

Go+ can automatically derive the type from the element literals, thus enables a much easier way that looks like Python.

a := [[1.0, 2.0, 3.0],
      [1.0, 2.0, 3.0]]

This enables numgo+ an API of literal arrays like numpy.

b = np.array([[1.5,2,3], [4,5,6]])
# array([[1.5, 2. , 3. ],
#        [4. , 5. , 6. ]])
b := ng.array([[1.5,2,3], [4,5,6]])
// array([[1.5, 2. , 3. ],
//        [4. , 5. , 6. ]])

wangkuiyi avatar Jul 02 '20 01:07 wangkuiyi

I came up with same idea and when you post this, I was having my breakfast.

model-collapse avatar Jul 02 '20 02:07 model-collapse

github.com/goplus/numgoplus => github.com/numgoplus/ng

xushiwei avatar Jul 02 '20 16:07 xushiwei

In Go+ we will support a feature named auto property. It means:

import ng "github.com/goplus/numgoplus"

a := ng.arange(15).reshape(3, 5)
fmt.Println(a)
// array([[ 0,  1,  2,  3,  4],
//        [ 5,  6,  7,  8,  9],
//        [10, 11, 12, 13, 14]])
fmt.Println(a.Shape()) // (3, 5)
fmt.Println(a.Ndim())  // 2
fmt.Println(a.Dtype().Name()) // 'int64'
fmt.Println(a.Itemsize()) // 8
fmt.Println(a.Size()) // 15
fmt.Println(reflect.TypeOf(a)) // numgoplus.Ndarray

b = ng.array([6, 7, 8])
fmt.Println(b)
// array([6, 7, 8])
fmt.Println(reflect.TypeOf(b)) // numgoplus.Ndarray

can be:

import "github.com/numgoplus/ng"

a := ng.arange(15).reshape(3, 5)
println(a)
// array([[ 0,  1,  2,  3,  4],
//        [ 5,  6,  7,  8,  9],
//        [10, 11, 12, 13, 14]])
println(a.shape) // (3, 5)
println(a.ndim)  // 2
println(a.dtype.name) // 'int64'
println(a.itemsize) // 8
println(a.size) // 15
println(reflect.typeOf(a)) // ng.Ndarray

b = ng.array([6, 7, 8])
println(b)
// array([6, 7, 8])
println(reflect.typeOf(b)) // ng.Ndarray

xushiwei avatar Jul 02 '20 16:07 xushiwei

In Go+ we have simplified form of 2d vector. It means:

b := ng.array([[1.5,2,3], [4,5,6]])
// array([[1.5, 2. , 3. ],
//        [4. , 5. , 6. ]])

can be:

b := ng.array([1.5,2,3; 4,5,6])
// array([[1.5, 2. , 3. ],
//        [4. , 5. , 6. ]])

xushiwei avatar Jul 02 '20 17:07 xushiwei

The simplified form of 2D vectors looks a step further than Python. It is close to MATLAB syntax. Great idea!

wangkuiyi avatar Jul 02 '20 17:07 wangkuiyi

Here is a typical PyTorch program in four different languages:

  • The Python version comes from the official tutorial.
  • The C++ version calls the ATen C library and Torch's csrc C++ library. Thanks to Jia-Kai Liu, a tech lead of PyTorch, for teaching me everything about the C/C++ core of PyTorch. Please follow instructions in https://github.com/wangkuiyi/cxxtorch to run this program.
  • The Go version calls imaginary Go binding of ATen and csrc.
  • The Go+ version is also imaginary.
C++ Go
#include <iostream>

#include "torch/script.h"
#include "torch/optim.h"

int main() {
  int N = 64, D_in = 1000, H = 100, D_out = 10;
  double learning_rate = 1e-3;

  auto x = torch::randn({N, D_in},
                        at::TensorOptions().requires_grad(false));
  auto y = torch::randn({N, D_out},
                        at::TensorOptions().requires_grad(false));

  // The Adam optimizer wants parameters in a std::vector.
  std::vector<at::Tensor> params = {
    torch::randn({D_in, H},
                 at::TensorOptions().requires_grad(true)),
    torch::randn({H, D_out},
                 at::TensorOptions().requires_grad(true))};

  // Build the optimizer.
  torch::optim::Adam adam(params,
                          torch::optim::AdamOptions(learning_rate));

  // Make quick references for using in the forward pass.
  const at::Tensor & w1 = adam.parameters()[0];
  const at::Tensor & w2 = adam.parameters()[1];

  for (int i = 0; i < 500; ++i) {
    auto y_pred = at::mm(at::clamp(at::mm(x, w1), 0), w2);
    auto loss = at::sum(at::pow(at::sub(y_pred, y), 2));

    if ((i % 100) == 99) {
      std::cout << "loss = " << loss << std::endl;
    }

    adam.zero_grad();
    loss.backward();
    adam.step();
  }
  return 0;
}
package main

import (
	"fmt"

	at "github.com/gotorch/gotorch/aten"
	"github.com/gotorch/gotorch/torch"
	"github.com/gotorch/gotorch/torch/optim"
)

func main() {
	N, D_in, H, D_out := 64, 1000, 100, 10
	learning_rate := 1e-3

	x := torch.RandN([]int{N, Din},
		at.TensorOptions().RequiresGrad(false))
	y := torch.RandN([]int{N, Dout},
		at.TensorOptions().RequiresGrad(false))

	params := []at.Tensor{
		torch.RandN([]int{Din, H},
			at.TensorOptions().RequiresGrad(true)),
		torch.RandN([]int{H, Dout},
			at.TensorOptions().RequiresGrad(true)),
	}

	adam := optim.NewAdam(params, optim.AdamOptions(learning_rate))

	w1 := adam.parameters()[0]
	w2 := adam.parameters()[1]

	for i := 0; i < 500; i++ {
		y_pred := at.Sum(at.Clamp(at.MM(x, w1), 0), w2)
		loss := at.Sum(at.Pow(at.Sub(y_pred, y), 2))

		if i%100 == 0 {
			fmt.Println("loss = ", loss)
		}

		adam.ZeroGrad()
		loss.Backward()
		adam.Step()
	}
}
Go+Python
package main

import (
	"fmt"

	"github.com/gotorch/gotorch/at"
	"github.com/gotorch/gotorch/torch"
	"github.com/gotorch/gotorch/torch/optim"
)

func main() {
	N, D_in, H, D_out := 64, 1000, 100, 10

	x := torch.RandN(N, Din, requires_grad=False)
	y := torch.RandN(N, Dout, requires_grad=False)

	w1 := torch.randn(D_in, H, requires_grad=True)
	w2 := torch.randn(H, D_out, requires_grad=True)

	learning_rate := 1e-3
	adam := optim.NewAdam([w1, w2], lr=learning_rate)

	for i := 0; i < 500; i++ {
		y_pred := at.Sum(at.Clamp(at.MM(x, w1), 0), w2)
		loss := at.Sum(at.Pow(at.Sub(y_pred, y), 2))

		if i%100 == 0 {
			fmt.Println("loss = ", loss)
		}

		adam.ZeroGrad()
		loss.Backward()
		adam.Step()
	}
}
import torch

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in, requires_grad=False)
y = torch.randn(N, D_out, requires_grad=False)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-3
adam = torch.optim.Adam([w1, w2], lr=learning_rate)

for t in range(500):
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    loss = (y_pred - y).pow(2).sum()

    if t % 100 == 99:
        print(t, loss.item())

    adam.zero_grad()
    loss.backward()
    adam.step()

From the above four programs, we can see

  1. The Go binding could be as effective as the C/C++ API, in terms of the number of lines of source code.
  2. If we want the Go+ version as short/concise as the Python version, the primary requirement to the Go+ transpiler is to support named function parameters. For example, x := torch.RandN(N, Din, requires_grad=False).

wangkuiyi avatar Jul 21 '20 17:07 wangkuiyi

In Go+, we can write as the following:

package main

import (
	"fmt"

	"github.com/gotorch/gotorch/at"
	"github.com/gotorch/gotorch/torch"
	"github.com/gotorch/gotorch/torch/optim"
)

N, Din, H, Dout := 64, 1000, 100, 10

x := torch.RandN(N, Din, {})
y := torch.RandN(N, Dout, {})

w1 := torch.RandN(Din, H, {RequiresGrad: true})
w2 := torch.RandN(H, Dout, {RequiresGrad: true})

learningRate := 1e-3
adam := optim.NewAdam([w1, w2], {LR: learningRate})

for i := 0; i < 500; i++ {
	yPred := at.Sum(at.Clamp(at.MM(x, w1), 0), w2)
	loss := at.Sum(at.Pow(at.Sub(yPred, y), 2))

	if i%100 == 0 {
		fmt.Println("loss = ", loss)
	}

	adam.ZeroGrad()
	loss.Backward()
	adam.Step()
}

New language features:

  • https://github.com/goplus/gop/issues/486: Struct type of function calling parameter can be automatically deduced.

xushiwei avatar Jul 23 '20 00:07 xushiwei

Here is a typical PyTorch program in four different languages:

  • The Python version comes from the official tutorial.
  • The C++ version calls the ATen C library and Torch's csrc C++ library. Thanks to Jia-Kai Liu, a tech lead of PyTorch, for teaching me everything about the C/C++ core of PyTorch. Please follow instructions in https://github.com/wangkuiyi/cxxtorch to run this program. From the above four programs, we can see
  1. The Go binding could be as effective as the C/C++ API, in terms of the number of lines of source code.
  2. If we want the Go+ version as short/concise as the Python version, the primary requirement to the Go+ transpiler is to support named function parameters. For example, x := torch.RandN(N, Din, requires_grad=False).

There's still a problem:libtorch uses exceptions as the main error handling mechanism, this causes two consequences:

  1. We have to find a way to pass C++ exception to Go, as described in https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/
    • Can Go+ provide a more efficient way to to the same thing?
  2. Take error handling into consideration, in terms of the number of lines of source code, the Go binding maybe not as effective as the C/C++ API because we have to check the error status of each line.
    • Go+ has already provided a neat error handling syntax, how can we leverage the Go+ mechanism to simplify error handling of the Go binding?

shendiaomo avatar Jul 24 '20 07:07 shendiaomo

There's still a problem:libtorch uses exceptions as the main error handling mechanism, this causes two consequences:

  1. We have to find a way to pass C++ exception to Go, as described in https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/

    • Can Go+ provide a more efficient way to to the same thing?
  2. Take error handling into consideration, in terms of the number of lines of source code, the Go binding maybe not as effective as the C/C++ API because we have to check the error status of each line.

    • Go+ has already provided a neat error handling syntax, how can we leverage the Go+ mechanism to simplify error handling of the Go binding?

Define error type: CppError

type CppError struct {
    what string
}

func (p *CppError) Error() string {
    return p.what
}

func NewCppError(what string) error {
    return &CppError{what: what}
}

Wrap functions with C++ exception

/*
OutputArgs XXX_Wrap(InputArgs input, pwhat *GoString) {
    try {
        return XXX(input);
    } catch(std::exception &e) {
        *pwhat = C.GoString(e.what());
        return OutputArgs();
    }
}
*/
import "C"

func XXX(input InputArgs) (output OutputArgs, err error) {
    var what string
    output = C.XXX_Wrap(input, &what)
    if what != "" {
        err = NewCppError(what)
    }
    return
}

xushiwei avatar Jul 26 '20 06:07 xushiwei