AbstractAlgebra.jl fix aliasing in matrix algebra constructor

Previously diagonal elements of matrices converted from the base ring were aliased.

Dec 03 '20 16:12 martinra

I think we already discussed this problem somewhere, without reaching a conclusion on whether elements should avoid being aliased (I totally may be misremembering). Do you have a use-case for avoiding aliasing?

Dec 03 '20 17:12 rfourquet

Yes, absolutely. Thanks for the quick reaction.

I found the aliasing while working on my modular forms code. If you use inplace arithmetic and Hecke is loaded, then you get wrong results without the fix.

using Hecke

R,t = PolynomialRing(ZZ, "t")

# MatrixAlgebra(R,2) does not work, because Hecke does not overwite mul! in
# that case.
M = MatrixSpace(R,2,2)

a = one(M)
b = M([1 0; 3 0])
c = M([1 1;0 0])

b*c == mul!(a,b,c)

From a performance perspective, I think Hecke is totally right here. For instance, I have an implementation of FareySymbols which is slower by a factor of 100 without inplace arithmetics. Obviously, one can hand-code this in that setting.

Was the aliasing allowed to save memory or to save the deepcopy, which I know can also be a huge source of slowness?

Dec 03 '20 19:12 martinra

I think if it fixes a bug and clearly isn't going to introduce new bugs we should merge this.

Dec 04 '20 01:12 wbhart

Yes, looks fine to me. Do we actually document somewhere which functions produce values that can safely be used as the first argument in mul! and friends?

Dec 04 '20 08:12 thofma

On Fri, Dec 04, 2020 at 12:59:41AM -0800, thofma wrote:

Yes, looks fine to me. Do we actually document somewhere which functions produce values that can safely be used as the first argument in mul! and friends?

Tricky issue, I don't know, but, to make things worse, the semantic is different for polynomials (well coeffs of poly) and matrices (entries) One is supposed to be fine with inplace (poly), the other not so much. That is my recollection in long discussions...

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/Nemocas/AbstractAlgebra.jl/pull/692#issuecomment-738659336

Dec 04 '20 09:12 fieker

Yes, it is documented on a ticket because we never got around to writing developer documentation:

https://github.com/Nemocas/Nemo.jl/issues/278

The relevant rule seems to be:

"matrix functions with an exclamation mark should not mutate the objects that occur as entries of the output matrix, though should be allowed to arbitrarily replace/swap the entries that appear in the matrix. In other words, these functions should be interpreted as inplace operations, rather than operations that are allowed to mutate the actual entries themselves."

Dec 04 '20 11:12 wbhart

With this ticket in mind, we don't need the PR here, as !-functions are not allowed to mutate the entries of the output.

The problem is that the add! function we define in Hecke does add!(A[i, j], B[i, j], C[i, j]) which I think is illegal according to https://github.com/Nemocas/Nemo.jl/issues/278. In particular according to the rule you cited.

Dec 04 '20 11:12 thofma

Ok then. I guess the question becomes whether Hecke can be modified to obey the rule and still give good performance in @martinra 's case?

Dec 04 '20 11:12 wbhart

No, it cannot. It is fast because it mutates the entries of the output matrix.

The function works as long as you know that in your output matrix, none of the entries alias each other. At the moment the output of zero_matrix and identity_matrix satisfy this. We usually use those as the first argument to !-functions. But of course this works because of an implementation detail.

I am fully behind https://github.com/Nemocas/Nemo.jl/issues/278, but maybe we need some more unsafe functions for matrices? Like add!! or something like that, where it is assumed that the entries of the output do not alias each other. I am not sure, but from our experience in the last 5 years I can tell that working with matrices is quite inefficient if we don't allow those entry-mutating functions.

Dec 04 '20 13:12 thofma

In my specific case, I was working with matrices over QQab, the abelian closure of QQ, which in the background relies on Antic. So if we now work with, say, mul!! as suggested, we formally have the same problem: In mul!!(::MatElem{nf_elem}...) you would want to call mul!(::nf_elm...), which then may assume that entries are unaliased per Nemocas/Nemo.jl#278, since nf_elem is essentially polynomials. But that would not be true for the current version of the matrix constructors. Or to say it more abstractly, the total aliasing of elements in the MatElem constructor, contradicts any nontrivial assertion of partially unaliased entries.

Dec 04 '20 14:12 martinra

I hope I understood it correctly. I agree that the current one(M) (where M is a matrix space) or M(a) constructors yield elements which cannot be used with the proposed add!! (or the current add! from Hecke).

I meant that the current zero_matrix(R, n, m) and identity_matrix(R, n) work fine with add!! (or the current add! from Hecke), because they don't do any aliasing at all.

So my suggestion was to introduce a constructor which guarantees no aliasing, and which is safe to use with add!!.

P.S.: We have purged all MatrixSpace calls from Hecke, because the zero_matrix/identity_matrix/matrix(R, ...) constructors are faster and do no caching.

Dec 04 '20 17:12 thofma

I guess I was misunderstanding your suggestion. Having an additional constructor (or keyword) so that add!! works can be a solution.

Thanks for telling about the MatrixSpace constructors in Hecke. I will consider making similar changes in my code.

One thing that I thought of is that the proposed solution and in fact Nemocas/Nemo.jl#278 would make it very difficult to write generic but efficient implementations for ring elements. For example, look at a hypothetical function power_by_squaring(a::RingElem, ::Int), which I actually have in my code and which is the way I discovered this aliasing originialy. Then depending on whether a <: MatAlgElem or, say, a <: PolyElem you need different implementations to provide good performance. While mul! is semantically always correct, you can write the function in such a way that the much faster mul!! is also possible. Notice that this cannot be solved by providing two implementations power_by_squaring(a::RingElem, ::Int) and power_by_squaring(a::MatAlgElem, ::Int), since there might be types outside of AbstractAlgebra that are not subtypes of MatAlgElem but effectively wrap such a type. One example that comes to my mind is ResElem{MatAlgElem{PolyElem}} which can be used to provide a more memory local implementation of some MatAlgElem{NumFieldElem}. In my code I have quite some types modelling various kinds of finite dimensional representations, which fall in the same category.

I can for now think of at least two solutions to this, but neither is particularly appealing. One way is of cause to revisit Nemocas/Nemo.jl#278, but I'm sure you had your reasons, so I understand that this might not be realistic. Yesterday evening I was also thinking about the rational behind it, and I understand where these guidelines come from.

The second solution could be to provide a general fallback mul!!(as::RingElem...) = mul!(as...), and specialize it where necessary (relatively little code in AbstractAlgebra and also on my side no big deal). The constructor is more work, since keywords are not dispatched on and keywords would appear to me the preferred design here. But it would mean to provide an additional ignored keyword for every RingElem constructor. Quite a mess, in my opinion, and a lot of work. There is a hack in the stile of

struct CompletelyUnaliased end
const completely_unaliased = CompletelyUnaliased

(a::Set)(::Type{CompletelyUnaliased}, args...) = a(args...)
(a::MatAlgebra)(::Type{CompletelyUnaliased}, args...) = a(args...; completely_unaliased = true)

The downside with this is that it violates calling conventions in AbstractAlgebra.

Obviously, one can simply not solve that problem and require that any object that wraps matrix algebra and requires efficiency provides a specialized implementation. While this violates encapsulation, it could be defended by the argument that whoever requires best performance will need to go into the details of their data structures anyway.

Summarizing, this is really a design question, and since I'm merely downstream for Oscar, I would leave this to all of you to make a decision on. I've only tried to depict the bifurcations of Nemocas/Nemo.jl#278. I can effectively live with any of these solutions, and many others that I haven't thought of now. The one thing that would be important to me, is that we don't land in what I call the Sage-trap of totally degraded performance.

Dec 05 '20 08:12 martinra

Thanks @martinra for your input.

Because I had to recall our reasoning behind https://github.com/Nemocas/Nemo.jl/issues/278, let me just write down some of the details here, as far as I remember. I think we thought of two solutions:

Allow aliasing between matrix entries. Advantage is that A[i, j] = z does not have to copy and also M([a b; c d]) does not have to make copies. Disadvantage is that no function is allowed to mutate A[i, j] directly. Thus no efficient add! etc. functions.
Disallow aliasing between matrix entries. Advantage is that we have fast entry-mutating functions like add!. Disadvantage is that M([a b; c d]) must internally do M(deepcopy([a b; c d])) and A[i, j] = z must internally do A[i, j] = deepcopy(z). To mitigate this, one can introduce setindex!(A, i, j, z, copy = false) and similar for the constructor.

I think at some point we tried out 2) but it was a lot of work to go over the codebase to introduce copy = false at the right places. (Or maybe there was also a problem with copy = false itself. I don't remember right now.) Thus we chose 1).

I think we could have some completely unaliased constructors. Would the contract of add!!(z, a, b) be, that z must be completely unaliased (a, b arbitrary) and z will stay completely unaliased? Also, if z is completely unaliased, then after f!(z,...), z may loose this property?

Dec 05 '20 14:12 thofma

My opinion on this is that time sensitive routines such as polynomial and matrix arithmetic over number fields should be implemented in C. We face numerous issues by not doing this: 1) performance is poor as per the issues leading to this PR, 2) jit compile times are massive for components using such arithmetic implemented in Julia, 3) we have had issues previously due to matrices over number fields being the only Nemo type without a C implementation (see the similar, zero_matrix, change_base_ring issues).

However, there are two reasons we haven't done such an implementation in C: 1) many algorithms over number fields rely on much more infrastructure, all of which is currently in Hecke. No one has any desire to duplicate all that work, 2) @fieker has argued that once the code is locked away in a "black box" in C, it is hard to work on, improve, experiment. Julia has some advantages after all!

I'd personally be interested in looking into whether there is some natural separation between fundamental algorithms for matrices and polynomials over number fields that could be implemented in Antic, with everything that is less fundamental that can be built on top of Antic remaining in Julia. If so, I believe that would be the best solution, to avoid getting bitten by this sort of thing from time to time.

In the mean time, I have no objection to using the mul!! idea in Nemo, so long as it doesn't propagate. I don't want us to have the library littered with double bang functions. I think it is a dangerous idea that can easily lead to an inconsistent system that will be hard to maintain. The whole issue with a different aliasing system for matrices vs polynomials arises due to Flint in the first place. Therefore it makes the most sense to keep such stuff in Flint and other related C libraries, in my opinion, rather than allow it to propagate to Julia.

Of course, what you do in Hecke is none of my business. So my comments apply only to Nemo, where I really want to be conservative when it comes to changes like this.

Dec 06 '20 09:12 wbhart

I'm also interested in having more algorithms for number fields implemented in Antic (I have done some things in Calcium, and that's a good place to do things where a numerical embedding is useful, but clearly more basic things belong in Antic).

Dec 06 '20 09:12 fredrik-johansson

On Sun, Dec 06, 2020 at 01:41:54AM -0800, Fredrik Johansson wrote:

I'm also interested in having more algorithms for number fields implemented in Antic (I have done some things in Calcium, and that's a good place to do things where a numerical embedding is useful, but clearly more basic things belong in Antic).

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Nemocas/AbstractAlgebra.jl/pull/692#issuecomment-739478713

Clearly, having stuff in C widens the scope of applications. It also makes it more static (harder to change from the point of Oscar) It may make it faster (I doubt that there are substantial savings just due to re-coding. Different algorithms: yes, lots of potential, otherwise, no...)

Infrastructure: need modular_proj/ lift, ie. given a prime number that is "easy", (not deviding any denominators), either build F_p[t]/f and work in this ring with zero-divisors (lacks in flint, I'd guess - could be pretend fq_nmod) or factor f mod p and map into the residue fields

For large degree f, this probably should involve remaider trees... (both for the projection and the lift)

In order to get bounds (if you want bounds), you need to be able to estimate conjugates and the constants involved in comparing conjugates and coefficients.

To deal with non-monic f, you need to have good "denominator estimates", the best I know of involve LLL of ideals.

For gcd of non-monic polynomials over a number fields, you'd also need some version of "content" - or loose speed.

Then there is a debate over the use of

all factors of f mod p (results in computing whatever you want modulo some (large) integer (in Z)
only factors of degree 1 (then fq_nmod collapses to nmod thus much faster, however, the rational_reconstruction then involves LLL)
for cyclo and quadratic: better prime selection

Those come immediately to mind - based on what we actually implemented. (gcd, gcdx, res, resx, factor, root, det, solve(?), minpoly, charpoly) (apart from factor not being optimal, the, most likely, best algorithm was only discovered afterwards.)

From the Oscar point of view, moving functionality to C does not advance us at all. From an ANTIC/Flint/Calcium (and possibly visibility) point of view this is different.

Dec 08 '20 10:12 fieker

While implementing things in C may help with a special case, this still does not solve the issue here.

This is an issue in AbstractAlgebra, unrelated to Nemo or flint. At the moment we have no way to have fast arithmetic with matrices in AbstractAlgebra. I think we have reasonable aliasing rules, which work quite good in practice. But I think that in the matrix domain we are missing the tools that allow people to write code that is on par with our competitors. I mean not for playing aroud in the REPL, but to do impressive stuff performance wise. I think we should have this, even if it is very "dangerous" for the average user.

Small anectode. I once helped @alexjbest with https://github.com/alexjbest/Coleman.jl. After some work, we were doing not too bad compared to Magma. But we were always lacking behind and it gets worse the larger the parameters (see second to last page https://alexjbest.github.io/papers/coleman-superelliptic.pdf). A mutating matrix multiplication/addition would have helped quite a lot I think. I did not realise it back then, because I forgot that our inplace operations for matrices are not mutating.

Dec 10 '20 10:12 thofma

Yes, I see this is a more general issue than for one specific ring.

Did you reduce to matrix multiplication? This is usually why one falls behind other implementations. It's a little hard to see how to do this generically, as the crossovers are going to be so dependent on the ring.

I'm skeptical that making a more complex generic implementation is really the answer. But I'm sympathetic to the problem (competing with other implementations), obviously.

Dec 10 '20 11:12 wbhart