koma
koma copied to clipboard
Why is there no Vector interface?
I would like to ask what motivated your choice to cover both vectors and matrices by the unified Matrix<T> interface. Are there some underlying technical difficulties? Did you consider adding Vector<T> in the past and dismissed the idea?
Right now I am converting a lot between List<T> and matrices Matrix<T> about which I know that they are vector-shaped. It would make the code cleaner and more performant, if there was a Vector<T> interface which would extend List<T>.
If this is not on your agenda, I would like to do this in my code as a private extension and would welcome any advice that you have.
Thank you!
Having a separate Vector implementation that implements List would be useful. If you are interested in developing a prototype, a PR would be welcome.
One thing to consider when implementing it is how to handle column vector vs row vector. For example, in linear algebra x^T * x
could either be the inner product or the outer product depending on if x
is Nx1 or 1xN. Various projects have solved this in various ways, and it might be inspirational to check out how they've done it:
MATLAB
Octave (MATLAB) forces everything into a 2-dimensional container by default, so 1xN vs Nx1 is explicit. However, using ;
or
as the delimiter will choose col or row orientation.
> a=[1 2 3] % Row vector
a =
1 2 3
> size(a) % Note the two-dimensional shape
ans =
1 3
> a' % Transpose works
ans =
1
2
3
> b=[1;2;3] % We can force col vectors
b =
1
2
3
> a'*a % Outer product
ans =
1 2 3
2 4 6
3 6 9
> a*a' % Inner product
ans = 14
For the ND case, it doesn't allow linalg operations:
> a=zeros(3,3,3)
> a'
error: transpose not defined for N-D object
Numpy
Numpy allows 1-dimensional tensors, and then makes arbitrary choices on what to do in linear algebra contexts, sometimes treating 1D vectors as either row or column vectors, sometimes producing errors.
>>> a=array([1,2,3])
>>> a
array([1, 2, 3])
>>> shape(a) # 1 dimension, no orientation
(3,)
>>> a.T # Transpose does nothing
array([1, 2, 3])
>>> a @ a # Inner product (first `a` is treated as row vec, second `a` treated as col vec)
14
>>> a.T @ a # Inner product still (transpose does nothing)
14
>>> a @ a.T # Inner product still (transpose does nothing)
14
>>> b = array([[1, 2, 3]]) # Explicit 2D container
>>> b.T # Now transpose works
array([[1],
[2],
[3]])
>>> b @ b.T # Expected inner product
array([[14]])
>>> b.T @ b # Expected outer product
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
>>> b @ a # Inner product (Random choice of treating 1D vector `a` as column vector...)
array([14])
>>> a @ b # Bizarre error: why not outer product?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 3)
>>> reshape(a, (3,1)) @ b # Force a cast on `a` to 2D to get outer product instead of error
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
Julia
Julia defaults to 2D vectors like MATLAB for literals, but can still handle 1D vectors in linear algebra by marking vectors as col or row (via Adjoint type).
julia> a=[1 2 3] # Row vector (Note the two-dimensional shape)
1×3 Array{Int64,2}:
1 2 3
julia> a' # Transpose works for 2D arrays
3×1 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
1
2
3
julia> a*a' # Inner product
1×1 Array{Int64,2}:
14
julia> a'*a # Outer product
3×3 Array{Int64,2}:
1 2 3
2 4 6
3 6 9
julia> b=zeros(3) # Force a 1 dimensional vector, defaults to col vector
3-element Array{Float64,1}:
0.0
0.0
0.0
julia> b' # 1D vector transposed gets marked with Adjoint and works, produces a row vector
1×3 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
0.0 0.0 0.0
julia> b*b' # Expected outer product
3×3 Array{Float64,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> b'*b # Expected inner product
0.0
Koma
Koma right now has a 2D container (Matrix) as well as an N dimensional container (NDArray). To get a 1D container you'd have to use NDArray and specify N=1, which would lose access to linalg operations. This is similar to MATLAB's approach. You'd have to cast the NDArray to a matrix (and force 2D) in order to get back to linalg.
Possible enhancements
There's a few ways we could go here, and in all cases implementing List
would be do-able.
-
We could create new types or aliases
ColVector
andRowVector
which are really just 2D containers (Matrices) in the underlying implementation, but enforce that the data has size of 1 in one direction or another. This would be extremely straightforward to implement -
We could try to fix the interaction between NDArray and Matrix. I personally find Julia's approach to be reasonable and clear: 2D by default, but 1D still works via a special case (Adjoint types). I personally find numpy's approach of forgetting orientation data for 1D vectors and arbitrarily deciding what to do on 1D linalg operations (e.g.
dot()
or@
's behavior) to be confusing. -
It's entirely possible the clean solution here is to eliminate the Matrix type entirely and move to NDArray for everything. We've had that discussion in previous issues-- I think there's some clarity to be gained from statically typing the 2D special case, but there's tradeoffs either way.
Any other proposals would also be welcome.
Could you comment a little more on what problems you're running into? For example, is it really necessary that Matrix
and NDArray
directly implement List
? Or is the problem simply that toList()
is inefficient, since it makes a copy of the data? It should be possible to create a class that implements List
while being backed by a Matrix
or NDArray
, so that toList()
could be very fast. It even would be possible to have them implement List
directly, though I worry that would complicate the API too much.
Having the API cleanly distinguish between inner and outer products is also an important question. But I think it's a different question from what you were asking about?