emma icon indicating copy to clipboard operation
emma copied to clipboard

Linear algebra library

Open aalexandrov opened this issue 10 years ago • 9 comments

The examples folder has several algorithms that mix the DataBag abstraction with linear algebra. Currently, we use Breeze, but we might switch to something else if we agree that it is better.

Let's try to make a summary of the different pros and cons of the various options here:

Breeze

  • :+1: Delegates to native libraries (e.g. BLAS, LAPACK)
  • :+1: Seems to be quite popular already.
  • :-1: A bit clumsy type design.

Spire

  • :+1: Proper type design inspired by algebra systems.
  • :-1: Executes everything in Scala.

~~netlib-java~~ (used by Breeze)

  • :+1: Seems to be gaining a lot of traction
  • :-1: Java based

aalexandrov avatar Dec 22 '15 13:12 aalexandrov

The netlib-java home page says that Breeze is built on top of it. Basically Breeze is a higher-level Scala wrapper. So I would vote against netlib-java.

Then we have on the one hand Breeze, which is used in MLlib for Spark, and on the other hand Spire, which is somewhat similar to Twitter's Algebird, which they say can be used on top of Scalding or Storm.

What we need to ask ourselves is if we want to completely translate the linear algebra API to DataBag comprehensions (slower, but all types can be supported) or just chunk the matrices/vectors into blocks that are forwarded locally to native libraries (faster, but only numerics can be supported). Ideally we would be able to handle (products of) numerics natively and fallback to the JVM for more complex data (with some warnings ofc).

joroKr21 avatar Jan 12 '16 13:01 joroKr21

Yes, netlib-java is very low-level and used by breeze.

I think numerics cover a big part of all usecases and I would vote for speed in their case. Ideally even for local execution (through breeze).

Nonetheless I like the approach by scalding/algebird very much. Allowing linear algebra operations on for example vectors of bloom filters sounds really cool.

fschueler avatar Jan 12 '16 13:01 fschueler

Oh God, I just realized that Breeze doesn't have the outer vector product :facepalm:

joroKr21 avatar Jan 14 '16 08:01 joroKr21

But you use this in ALS, don't you?

aalexandrov avatar Jan 14 '16 08:01 aalexandrov

No, I use Breeze only to invert the matrix. There's a ticket for the outer product on GitHub.

joroKr21 avatar Jan 14 '16 09:01 joroKr21

I'm leaving @akunft in charge of this.

aalexandrov avatar Apr 05 '16 17:04 aalexandrov

I think we can close this, as the discussion has moved to #187. @stratosphere/emma-committers Does anybody object?

aalexandrov avatar Apr 20 '16 13:04 aalexandrov

:+1:

fschueler avatar Apr 20 '16 13:04 fschueler

New meta-issue in #188.

akunft avatar Apr 20 '16 13:04 akunft