t3f
t3f copied to clipboard
Mix core shapes to avoid having too many indices
I would like to contract all but a few indices in a TT. At the moment, I am doing this by defining TT matrices of shape ((1,1,1,..., n_ext), (m_1, ..., m_N)), and then contracting the m indices (with t3f.matmul()
) with some other TT matrix.
The problem is that I'm interested in large N, so that I end up getting:
2018-03-08 14:55:41.571939: F tensorflow/core/framework/tensor_shape.cc:243] Check failed: ndims_byte() < MaxDimensions() (unsigned char value 254 vs. 254)Too many dimensions in tensor
It would be nice to avoid this by allowing a TT to have both 3d and 4d cores. t3f.full()
could still give a matrix. t3f.matmul()
would require the 4d cores to be in the same locations.
For similar reasons, it would be good to be able to have genuine matrix-vector multiplication. At the moment I do this with a shape = ((m_1, .... m_N), (1,1,....)) TT matrix, also contributing to the "too many dimensions" issue.
A couple more thoughts.
-
I see the problem with mixing cores is that we don't know whether the middle index of a 3d core is a row or column index of the matrix.
-
If the second issue (multiply matrix and vector) can be overcome, I guess one solution to the first issue would be to do
t3f.full(a[0,0,...,:])
. This won't work ifa
is a matrix, as__getitem__
isn't implemented. Incidentally,a[0,0,....,0]
(setting all indices) doesn't seem to work -- one of them has to be a slice.
Can you please provide a small reproducible example? Because I'm not exactly sure I got you correctly.
By "contacting all indexes but a few" you mean something like Y[i_3, i_5] = sum_{i_1,i_2,i_4} X[i_1,...,i_5] ?
I can implement it as an op if you need.
If you still need to multiply a tensor by a matrix I can provide something like t3f.to_tt_vector to covert between representations, but I'm still not sure about the use case.
For the problem of indexing tt matrices I haven't implemented it because I was not sure about the API. Should it be something like M[i_1,...,i_d,j_1,...,j_d] or M[I, j] where I and j encode the indices of the row and col of the underlying matrix? What to do in the batch case?
If you have ideas about the right API here it is very easy to implement.
I'm implementing a TT based classifier similar to Stoudenmire and Schwab's paper and also yours. At the moment I am implementing f^l(x) = W^l Phi(x) in the following way
-
W is a tt matrix of shape ((1,1,1...L),(F,F,F...)), where L is the number of labels and F is the number of features.
-
Phi is a tt matrix of shape ((F,F,...), (1,1,...)) with rank one cores.
-
I use t3f.matmul to contract them together, yielding a tt matrix of shape ((1,1,...,L), (1,1,....)).
This is where the "too many dimensions" problem rears its head. In an ideal world W could have shape ((None, None, ... ,L), (F,F....)) and Phi would be have shape (F,F,...) and I would just contract the feature indices, yielding a single index.
Actually this is easier to implement using the TensorTrainBatch
class, since you can compute the matrix of pairwise inner products. I've written up a small example here.
Thanks for this! Two comments:
-
It seems you've used the batch structure for the labels. Does that mean I've lost the possibility to use it for batches?
-
(More seriously) You now have a completely new TT for each label, whereas in my example it was only the last core of the TT that knew about the label.
There was an issue on the tensorflow repo about the merits of allowing genuine matrix vector multiplication.
https://github.com/tensorflow/tensorflow/issues/9055
- You still can use a batch of objects. pairwise_flat_inner(x, w) is a matrix of cross products of size x.batch_size x w.batch_size
- Good point, I see. And where do you get the MaxDimensions error from? I just tried to multiply t3f.random_matrix([[2]*256, [1]*256]) by itself transpose and it works fine.
#56
I added matrix vector multiplication to by fork here.
The too many dimensions
error appears when you take t3f.full
Wow, you put in a lot of work!
However, I intentionally didn't want to mix matrix/vector objects with tensors because I heard that this mixing was one of the major sources of confusion for TTPY framework ("why a matrix is a matrix and a vector is a tensor??").
How do you think, will t3f.vector_to_tensor and t3f.tensor_to_vector solve your problems? So you would be able to t3f.full(t3f.vector_to_tensor(t3f.matmul(A, b))).