Allocations using `mul!`
Thanks for this package! I've noticed there are some allocations when using mul!(y,A,x), where x,y are <: Vector and A is a KroneckerProduct. This seems to be due to alloc_temp_array in _kron_mul_fast_square!(C, B, factors).
Since we're calling mul! many times, these allocations add up. Would it make sense to have a version of KroneckerProduct which stores this temporary array as a hidden field to avoid allocating?
I think TensorOperations does something like that.
I am a bit hesitant to allow hidden matrices to be generated for this package, though it might make sense to have versions of mul! where users can provide a temp matrix?
Thanks for the note on TensorOperations.jl. A version of mul! with a temporary array argument also makes sense. I'll take a look at TensorOperations.jl, but if Kronecker.jl fits better I'm happy to try a PR for the new mul!.
Kronecker is likely easier to use if you are just doing matrix algebra. TO is a bit more general and advanced and can likely attain somewhat higher performance. Take a look and share your thoughts if we should extend Kronecker in this direction.
After looking at TO, I think Kronecker.jl would be a better fit. I can work on a PR for the temporary array argument mul!, or if you have some ideas I'm happy to try to implement them.
Maybe you can best take the lead in this and we can look together if it can be improved. Feel free to open a PR!