einsum2
einsum2 copied to clipboard
parallel einsum products of 2 tensors
- einsum2
This is a python package that contains a parallel implementation for a subset of ~numpy.einsum~ functionality.
~numpy.einsum~ is a fantastic function for multiplying ~numpy~ arrays. However, ~numpy.dot~ and ~numpy.tensordot~ are typically faster, especially if ~numpy~ is linked to a parallel implementation of BLAS: then ~numpy.dot~ and ~numpy.tensordot~ will take advantage of the multiple CPUs, whereas ~numpy.einsum~ remains single-threaded.
The trouble is, some ~einsum~ products are impossible to express as ~dot~ or ~tensordot~. For example, : numpy.einsum("ijm,kmi->ijk", A, B) returns a tensor C with C_{i,j,k} = \sum_{m} A_{i,j,m} B_{k,m,i}. This cannot be expressed as a ~dot~ or ~tensordot~ because the shared axis i is included in the output C.
This operation /can/ be expressed in terms of ~numpy.matmul~, and in particular this example is equivalent to ~numpy.matmul(A, np.transpose(B))~. However, on my machine, ~numpy.matmul~ does not appear to take advantage of parallel BLAS with multiple cores. This may eventually change in the future (last year Intel introduced [[https://software.intel.com/en-us/articles/introducing-batch-gemm-operations][batched GEMM]] operations), but in the meantime, you can use ~einsum2~ to parallelize such ~einsum~ calls.
~einsum2~ is also compatible with the [[https://github.com/HIPS/autograd][autograd]] package for automatic differentiation.
** Installation
Pre-requisites are a C compiler that supports OpenMP, ~Cython~, ~numpy~, and ~pip~. Assuming you satisfy these requirements, you can install with : pip install . at the top-level directory of ~einsum2~.
If you have problem installing or running, it's probably because of your C compiler. On OSX in particular, the default compiler ~clang~ does not support OpenMP and thus won't work. I've also occasionally run into issues using Anaconda Python on Linux, apparently because of the old gcc-4 it was compiled with -- however, I'm currently unable to reproduce this error.
In case you are having issues with the C compiler, it is recommended to install and run ~einsum2~ in a virtual environment with a compatible C compiler. For example, if you use Anaconda Python, you can do the following:
- Create a new virtual environment named ~env_name~ with ~conda create -n env_name python=3.5 anaconda~ (alternatively, you can use ~python=2.7 anaconda~).
- Switch to the new environment with ~source activate env_name~.
- Do ~conda install gcc~ to install the Anaconda distribution of ~gcc~, which both supports OpenMP and is fully compatible with the Anaconda Python. Note this clobbers your system ~gcc~, which is why we are using a virtual environment here.
- Finally, install ~einsum2~ with ~CC=gcc pip install .~ (the ~CC=gcc~ is required on OSX, otherwise the installation will try to use the default ~clang~ compiler that does not support OpenMP).
To test whether the parallelization is working, try ~sh test/profile_einsum2.sh~, you should see a speedup for the function calls using multiple threads (~OMP_NUM_THREADS > 1~).
** Usage
*** ~einsum2~
This computes ~einsum~ products that can be expressed in terms of ~matmul~, ~transpose~, and ~reshape~ operations. It can either have the form : einsum2.einsum2(a, a_sublist, b, b_sublist, out_sublist) or : einsum2.einsum2(subscripts_str, a, b) Here ~a~ and ~b~ are tensors. In the first format, ~a_sublist~ and ~b_sublist~ label the indices of ~a~ and ~b~, while ~out_sublist~ gives the indices of the output array. In the second format, ~subscripts_str~ is a string specifying the subscripts of ~a~, ~b~, and the output, and looks like ~a_subs,b_subs->out_subs~.
The official [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html][numpy.einsum documentation]] provides more details about specifying the subscripts; here is an example. The following two calls are equivalent: : einsum2(A, [0,1,2,3], B, [4,1,0], [0,2,4]) and : einsum2("ijkm,nji->ikn", A, B) and returns a tensor C with C_{ikn} = \sum_{j,m} A_{ijkm} B_{nji}.
Unlike ~numpy.einsum~, we allow the subscripts in the first format to be a list of arbitrary hashable keys, so : einsum2(A, ["zero","one","two","three], B, ["four","one","zero"], ["zero","two","four"]) would also be allowed.
~einsum2~ has several limitations compared to ~numpy.einsum~: it only operates on two input arrays, does not allow diagonal operations (repeated subscripts on the same array), and requires the output subscripts to always be specified.
Unlike the standard einsum, einsum2 will perform computations in parallel. The number of parallel threads is selected automatically, but you can also control this with the environment variable ~OMP_NUM_THREADS~.
To perform the parallel computation, einsum2 will either use numpy.dot (if possible), otherwise it will use a parallel for loop. The advantage of using numpy.dot is that it uses BLAS which is much faster than a for loop. However, you need to make sure numpy is compiled against a parallel BLAS implementation such as MKL or OpenBlas. You won't need to worry about this for most packaged, precompiled versions of numpy (e.g. Anaconda Python).
*** ~batched_dot~
This is a parallel implementation of ~numpy.matmul~. More specifically, for 3-tensors ~a~ and ~b~, : einsum2.batched_dot(a, b) computes ~numpy.matmul(a,b)~ in parallel.
~batched_dot~ is only currently implemented for ~a~ and ~b~ that are 3-tensors. If the leading dimension has length 1, then ~batched_dot~ will use ~numpy.dot~ to take advantage of BLAS.
*** ~einsum1~
This is a convenience function for ~einsum~ operations on a single array. In particular, : einsum2.einsum1(in_arr, in_sublist, out_sublist) returns an array ~out_arr~ that is derived from ~in_arr~, but with subscripts given by ~out_sublist~. In particular, all subscripts of ~in_sublist~ not in ~out_sublist~ are summed out, and then the axes of ~in_arr~ are rearranged to match ~out_sublist~.
Like ~einsum2~, arbitrary keys are allowed to label the subscripts in ~einsum1~. Also like ~einsum2~, repeated subscripts (i.e. diagonal operations) are not supported.