matrex icon indicating copy to clipboard operation
matrex copied to clipboard

Talking to Numpy

Open vegabook opened this issue 6 years ago • 4 comments

Brilliant library. Of course Numpy has a 2 decades of accumulated functionality on top, so there's still a lot of stuff python-side that I'd want to use.

How can Matrix talk to Numpy efficiently, via, say erlports? Erlports sends python back as follows:

Erlang/OTP 21 [erts-10.0.8] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Interactive Elixir (1.7.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> {:ok, pid} = :python.start()                   
{:ok, #PID<0.176.0>}
iex(2)> xx = :python.call(pid, :py, :eig, [10])        
{{:"$erlport.opaque", :python,
  <<128, 2, 99, 110, 117, 109, 112, 121, 46, 99, 111, 114, 101, 46, 109, 117,
    108, 116, 105, 97, 114, 114, 97, 121, 10, 95, 114, 101, 99, 111, 110, 115,
    116, 114, 117, 99, 116, 10, 113, 1, 99, 110, 117, 109, 112, 121, ...>>},
 {:"$erlport.opaque", :python,
  <<128, 2, 99, 110, 117, 109, 112, 121, 46, 99, 111, 114, 101, 46, 109, 117,
    108, 116, 105, 97, 114, 114, 97, 121, 10, 95, 114, 101, 99, 111, 110, 115,
    116, 114, 117, 99, 116, 10, 113, 1, 99, 110, 117, 109, 112, ...>>}}
iex(3)> xx = :python.call(pid, :py, :eig_msgpack, [10])
<<146, 133, 164, 116, 121, 112, 101, 164, 60, 99, 49, 54, 164, 107, 105, 110,
  100, 160, 162, 110, 100, 195, 165, 115, 104, 97, 112, 101, 145, 10, 164, 100,
  97, 116, 97, 218, 0, 160, 61, 85, 216, 39, 118, 191, 18, 64, 0, 0, 0, 0, ...>>
iex(4)> Msgpax.unpack!(xx)
[
  %{
    "data" => <<61, 85, 216, 39, 118, 191, 18, 64, 0, 0, 0, 0, 0, 0, 0, 0, 152,
      188, 99, 203, 39, 212, 235, 191, 0, 0, 0, 0, 0, 0, 0, 0, 176, 49, 23, 6,
      60, 177, 190, 63, 82, 69, 170, 45, 5, 196, 229, 63, ...>>,
    "kind" => "",
    "nd" => true,
    "shape" => '\n',
    "type" => "<c16"
  },
  %{
    "data" => <<24, 3, 175, 142, 209, 167, 205, 191, 0, 0, 0, 0, 0, 0, 0, 0, 78,
      49, 82, 243, 187, 186, 202, 191, 0, 0, 0, 0, 0, 0, 0, 0, 24, 28, 80, 170,
      100, 115, 198, 191, 68, 9, 19, 108, 248, 3, 178, ...>>,
    "kind" => "",
    "nd" => true,
    "shape" => '\n\n',
    "type" => "<c16"
  }
]

As you can see, calling eig, which doesn't use msgpack, just returns a binary blob for Numpy. However if we msgpack the numpy arrays first, then we get a more structured return, which might be useful. The "data" field could go into a matrex matrix? How would one go about doing that?

Here by the way is the Python code:

from __future__ import print_function
import numpy as np
import pdb
import IPython
import string
import msgpack
import msgpack_numpy as m
m.patch() # patch msgpack to do numpy

def eig(n):
    np.random.seed(8472)
    xx = np.random.rand(n * n).reshape(n, n)
    yy = np.linalg.eig(xx)
    return yy

def eig_msgpack(n):
    return msgpack.packb(eig(n))

def dicadd(dict):
    return {string.join(dict.keys()): sum(dict.values())}

And here are my mix.exs deps:

  defp deps do
    [
      {:erlport, "~> 0.10.0"},
      {:benchwarmer, "~> 0.0.2"},
      {:msgpax, "~> 2.0"}
    ]

vegabook avatar Sep 23 '18 13:09 vegabook

Hello!

Thanks for your praise!

What you suggest sounds like a nice idea. I looked at numpy binary format several months ago, if I remember correctly it should be perfectly compatible with the new matrex format (the new version is in branch 'array'), because this format was inspired by numpy.

In this format data is stored separately in binary form and metafields, like shape, data type, strides etc. are stored as fields of struct.

The 'array' branch is generally ready for merging, except I that I am stuck with multi-dimensional matrices visualization in ASCII.

versilov avatar Sep 24 '18 16:09 versilov

I'll take a look.

Also there's Apache Arrow.........seems to be gaining traction with Python Ray Framework, GPU data frame. It's basically being written by the guy who wrote Pandas. I'll dig around and see how compatible it is to your format.

Personally I would love to be able to be able to do the maximum I can in Elixir. For production use cases, python is just so stone age now. It's great that you're addressing what to me is Erlang ecosystem's single biggest weakness.

vegabook avatar Sep 24 '18 17:09 vegabook

Hey @versilov,

First of all thank you for this amazing library! 👏

This idea by @vegabook sounds quite interesting, is there any updates?

devnacho avatar Nov 29 '18 14:11 devnacho

Hello @devnacho!

Actually, matrex format is already very close to the numpy's one in the branch 'array'. The data is stored separately and meta fields, like size, type, strides etc. are members of a map.

I still do not merge this branch into master, because I am stuck with multi-dimensional matrices ASCII display.

Guess I should release it with broken multi-dimensional display.

versilov avatar Nov 30 '18 15:11 versilov