Adding matrix norms to the `stdlib_linalg` module.
Motivation
I'm currently updating my package LightKrylov to make use as much as possible of the new linalg features offered in the stdlib_linalg module. Among the things I'll need eventually are functions to compute matrix norms. Is there any on-going work at the moment from @perazz, @jvdp1, or @jalvesz on this subject or could I make this my first task with stdlib ?
Prior Art
- SciPy:
scipy.linalg.norm(A, ord=None, axis=None, keepdims=False, check_finite=True). It handles both standard vector norms as well as a variety of vector-induced and non-induced matrix norms. - Julia:
LinearAlgebra.norm(x, p)wherexis ann-vector returns the $p$-norm of this vector. - Julia:
LinearAlgebra.norm(A, p)whereAis anm x nmatrix returns the "entry-wise" $p$-norm ofA. - Julia :
LinearAlgebra.opnorm(A, p)whereAis anm x nmatrix returns the vector-induced $p$-norm ofA.
Additional Information
While most people might be accustomed to the SciPy standard, my personal preference would still go to the Julia principle of separating the true vector-induced norms (i.e. 1-norm, 2-norm, $\infty$-norm) from the entrywise ones (e.g. Frobenius) although I'm obviously open to discussion.
Great idea @loiseaujc. Rn I'm trying to tackle the decompositions (pseudo-inverse, Cholesky, QR) and I was planning to address norms and condition number next. So, your contribution would be very welcome!
As a way to separate between the two approaches to norm, would there be a way to use the same norm interface for all, but with different arguments maybe? I was loosely thinking that something like norm(A,2) vs. norm(A,'Frobenius') could work.
We probably could. Julia is more exhaustive than SciPy for this. Having norm and opnorm allows them to have both the element wise p-norm (for abitrary p) and the induced one (for p=1, 2 and infinity). SciPy is restricted to the standard operator induced norms + the Frobenius one. I think it mostly depends on how exhaustive we want to be (and how often the non-Frobenius element wise norms are actually being used which I think is not much).
The more I think about it, the more it feels like the Julia way may actually cover more use-cases than are actually used in practice. I still like the difference between norm and opnorm though just to recall the user that these are fundamentally different mathematical objects. Having the two interfaces might also make it more transparent when a 2D array represent an actual matrix or a collection of vectors instead of having to play with the axis keyword.
I took some time to think about all the different norms (both for vectors and matrices) I've ever needed in my scientific computing adventure over the past 15 years or so. Here is a pretty standard list (+ whatever scipy/Julia are offering).
Vector norms
norm(x, 1): 1-norm of a vector (i.e. sum of absolute values),norm(x, 2): 2-norm of a vector (i.e. standard Euclidean norm),norm(x, inf): $\infty$-norm of a vector (i.e. maximum absolute value),norm(x, p): p-norm of a vector.
Among these four, the Euclidean 2-norm is by far the most popular and, more often than not, is what people mean when discussing the norm of a vector $\mathbf{x}$.
I've used the other three quite regularly as well but more from a convex optimization perspective. I hardly ever had to actually compute them but they're simple enough to implement and I believe are something everybody expects from any standard linear algebra module.
Matrix norms
opnorm(A, 1): 1-norm of a matrix (i.e. maximum absolute column sum)opnorm(A, 2): 2-norm of a matrix (i.e. largest singular value)opnorm(A, inf): $\infty$-norm of a matrix (i.e. maximum absolute row sum)opnorm(A, "nuclear"): nuclear norm (i.e. sum of singular values)opnorm(A, "fro"): Frobenius norm.
Like everyone, I use the Frobenius norm all the time. Similarly, the 2-norm and the nuclear norm are quite extensively used in model order reduction. As before, the 1-norm and the $\infty$-norm, although I don't quite know where they are being used, are some things everybody would expect from a standard linalg module.
There are a bunch of other matrix-related norms that I have used or seen being used over the years but they are more of a niche thing. These include:
schatten_norm(A, p): Schatten p-norm (i.e. the vector p-norm applied to the singular values ofA).lognorm(A, 2): Log-norm of a matrix, also known as its numerical abcissa and defined as $\lambda_{\max} ( \frac{\mathbf{A} + \mathbf{A}^H}{2})$.
Among these two, the lognorm is probably the one I've used most often. It is related to the non-normality of the matrix and proves important when studying linear time invariant systems of the form $\dot{x} = Ax + Bu$.
Here are my thoughts (in no particular order) on the matter:
- Even if it goes against the
scipynotation, I kind of prefer thenorm/opnormdichotomy as it makesnorm(x, 2)andopnorm(x, 2)very clear from just reading the code and not having to check the declarations to see ifxis rank-1 or rank-2 array. normandopnormshould return the Euclidean and Frobenius norms by default, respectively.- The
schatten_normandlognormcould be added although they wouldn't be very high in my priority list. - As a starting point, the implementations should focus on general matrices. Specialized computations for say Hessenberg matrices, Circulant matrices etc could be included later on. Since they're all standard rank-2 arrays, we can't naturally dispatch based on the matrix type and, the simplest, would be to write a master subroutine with
is_symmetric(A),is_hessenberg(A)etc dispatching to the specialized solvers. If you know of any better/more fortranic way, I'm all ears. - On the development and mathematical sides, I would restrict
normto take only a rank-1 array as argument. This would make less code bloat to handle all the corner casesscipyis handling (i.e. whether the norm is applied on the whole array, only along the rows, or only the columns). On the other hand, this would force the user to write aforloop to compute the norm of a collection of vectors represented as a rank-2 array and would be a major departure from thenumpy/scipystandard.
I tend to think that being strict about the rank-1 array argument for norm is a better scientific/programming practice than what scipy currently offers. It doesn't leave any room for interpretation. Forcing the user to write a for loop for a collection of vectors might also make the code more readable and less error-prone (although it may possibly cause some performance loss (?), I don't know).
@perazz : On a side note, do you have a publicly available roadmap for the development of the stdlib_linalg module? I have seen some discussions on the fortran-lang discourse but nothing centralized. Translating/adapting everything scipy.linalg is offering is too much for a single person, even a one-man team. I suppose however that we all have bits and pieces scattered around our different code bases (e.g. I have some sketch for schur, expm, hankel, toeplitz, etc).
@loiseaujc I basically agree with everything you said! I just have one question regarding the naming opnorm, is it for "operator norm" as to say the matrix is an operator on a vector space? could there be a more "explicit" name to make the distinction? in any case, totally agree that it is good to have 2 well distinguished interfaces. If not, opnorm is just fine.
Regarding your query about the type of matrix, I think stdlib has in the linalg module https://stdlib.fortran-lang.org/page/specs/stdlib_linalg.html there are several checks such https://stdlib.fortran-lang.org/page/specs/stdlib_linalg.html#is_hermitian-checks-if-a-matrix-is-hermitian, https://stdlib.fortran-lang.org/page/specs/stdlib_linalg.html#is_hessenberg-checks-if-a-matrix-is-hessenberg among others.
I tend to think that being strict about the rank-1 array argument for norm is a better scientific/programming practice than what scipy currently offers. It doesn't leave any room for interpretation. Forcing the user to write a for loop for a collection of vectors might also make the code more readable and less error-prone (although it may possibly cause some performance loss (?), I don't know).
Agree, and I don't think that in Fortran you would have a performance loss (that would be true in python). If the implementations and interfaces are well designed the compilers might even be able to properly vectorize nicely. One thing to consider though, would be to have at least dimensions 2, 3 and 4 with explicit unrolled implementations and then a generic one for d>4.
I've used norm and opnorm in the discussion because that is what is being used in Julia. opnorm indeed stands for operator norm. While using norm for vectors is pretty obvious, I don't have a strong opinion regarding opnorm (although I don't have a better alternative to propose).
Thanks @loiseaujc for the detailed comments. I very mostly agree with your ideas and I would summarize the consensus so far:
- let's have two separate functions for vector (
norm) and matrix (opnorm) norms - let's start from general matrix types, and then we can extend them later.
Here is some further thoughts I'm having about the design.
- the Fortran standard has norm2 that works for both arrays and matrices with the optional
dimspecification. We should find a way to be sure our notation is not easily confused with this. - We should find a unique way to express the norm type. For example, could it be that we require a
characterinput for all norms, instead of a numeric one? I'm saying so because I'd personally like more having to deal with something likenorm(x, '2'),norm(x, 'L2'),norm(x, 'inf')rather than having to donorm(x, ieee_value(0.0, ieee_positive_inf))). So I think the code would be clear for both the user and the developer:
character(*), optional, intent(in) :: norm_type
select case (norm_type)
case ('2','Euclidean')
...
case ('inf','Inf','Infinity')
...
end select
- For the (later) specialized matrix norms, I would think that an
optionaladditional flag likeis_hessenberg=.true.ormatrix_form='Hessenberg'is probably the most performant design, because we can't really check if the matrix complies to any special properties at all times, it would be too expensive.
- the Fortran standard has norm2 that works for both arrays and matrices with the optional
dimspecification. We should find a way to be sure our notation is not easily confused with this.
I shall be able to start working on this by the end of next week. I'll start with the baseline implementation and will see from there how we can improve and make sure it is not confounded with norm2.
- We should find a unique way to express the norm type. For example, could it be that we require a
characterinput for all norms, instead of a numeric one? I'm saying so because I'd personally like more having to deal with something likenorm(x, '2'),norm(x, 'L2'),norm(x, 'inf')rather than having to donorm(x, ieee_value(0.0, ieee_positive_inf))). So I think the code would be clear for both the user and the developer:character(*), optional, intent(in) :: norm_type select case (norm_type) case ('2','Euclidean') ... case ('inf','Inf','Infinity') ... end select
I like this.
- For the (later) specialized matrix norms, I would think that an
optionaladditional flag likeis_hessenberg=.true.ormatrix_form='Hessenberg'is probably the most performant design, because we can't really check if the matrix complies to any special properties at all times, it would be too expensive.
Agreed. I prefer the matrix_form = "Hessenberg" rather than is_hessenberg = .true.. I may be naïve but the second would incur a lot of optional args (is_hessenberg, is_symmetric, is_triangular, etc) of which only one can be set to .true.. The matrix_form = string seems easier to handle and if not specified, opnorm defaults to the computation using the general matrix approach.
I prefer the
matrix_form = "Hessenberg"
Absolutely agree, and it would be more in line with the norm_type argument
Here is a draft/hacky implementation of norm I'd like to discuss a bit before drafting the PR.
#:for rk, rt, ri in RC_KINDS_TYPES
#:if rk !="xdp"
module function stdlib_linalg_norm_${ri}$(x, which) result(out)
!> Input vector x[n]
${rt}$, intent(in) :: x(:)
!> [optional] Which vector norm is being computed.
character(len=*), optional, intent(in) :: which
!> Norm of the vector.
real(${rk}$) :: out
!> Internal variables.
integer(ilp) :: n, idx
character(len=:), allocatable :: which_
!> Dimension of the vector.
n = size(x, kind=ilp)
!> Default: Euclidean 2-norm.
which_ = optval(which, "2") ; which_ = to_lower(which_)
!> Dispatch computation to the appropriate driver.
select case (which_)
case ("2", "euclidean", "l2")
out = nrm2(n, x, 1_ilp)
case ("1", "l1")
#:if rt[0] == "r"
out = stdlib_${rk[0]}$asum(n, x, 1_ilp)
#:elif rt[0] == "c" and rk[0] == "s"
out = stdlib_scsum1(n, x, 1_ilp)
#:elif rt[0] == "c" and rk[0] == "d"
out = stdlib_dzsum1(n, x, 1_ilp)
#:elif rt[0] == "c" and rk[0] == "q"
out = stdlib_qzsum1(n, x, 1_ilp)
#:endif
case ("inf")
#:if rt[0] == "r"
idx = stdlib_i${rk[0]}$amax(n, x, 1_ilp)
#:else
idx = stdlib_i${ri[0]}$max1(n, x, 1_ilp)
#:endif
out = abs(x(idx))
case default
stop
end select
end function stdlib_linalg_norm_${ri}$
#:endif
#:endfor
After having extended the interface for nrm2 for complex-valued vectors, the code compiles correctly. I haven't yet implemented the tests though, but I don't see any obvious errors. I do have a few questions though to polish it and make it more fypp-/stdlib-compliant. There are a couple of things I'm not quite sure how to handle properly:
- At the moment, we have the $\ell_2$ norm (default), the $\ell_1$ norm as well as the infinity norm. In my original post, I discussed the possibility also to have a generic $p$-norm. Since
whichis of typecharacter(len=*), we would need to check that it does not correspond to a case already handled by the dispatch, thatwhichis one character-long (assuming we only consider integer $p$), convert the character to afloat(either single or double) and then compute $\sqrt[p]{\sum \vert x_i \vert^p}$. This seems unnecessarily complicated for a relatively edge case. A simple option however would be to have a dedicatedstdlib_linalg_pnorm_${ri}$function for whichwhichis actually a non-negativefloatorintegerand put it under the same interface as `stdlib_linalg_norm_${ri}$. What would be your take on this? - Second point has to do with the 1-norm computation. Is there a more
fypp-esque way to handle the different types? The problem I see is that whileasumis defined for bothrealandcomplex,sum1only is defined forcomplex. I hesitate on whether I should keep the code as it is or if we should create an interface forasum, as well as one forsum1which under the hood also callsasumifxis real. Basically something like this (don't pay attention to the syntax)
interface asum
module function rsasum
module function csasum
...
end interface
and
interface sum1
module function rsasum
module function csum1
...
end interface
asum for real-valued vectors would basically end up being referenced by two different interfaces. I'm not sure whether this is the cleanest thing to do although it would somehow make some sense from a mathematical point of view.
- Last point finally is that I'm not sure how to properly handle the case where
whichis none of the cases covered by the dispatch (e.g. user made a typo).
I'll wait for your feedback at the moment and will start implementing the tests and documentation in the mean time.
A simple option however would be to have a dedicated stdlib_linalg_pnorm_${ri}$ function for which which is actually a non-negative float or integer and put it under the same interface as `stdlib_linalg_norm_${ri}$. What would be your take on this?
This sounds good! another option, if you prefere to keep a single interface:
use stdlib_str2num, only: to_num
use stdlib_error, only: error_stop
...
case default
p = to_num( which_ , p )
if( p > 2 .and. p < p_limit ) then !> define a limit p value
!> compute the p-norm
else
call error_stop( "Invalid p-norm",code )
end if
If you prefere to have an interface that takes an integer p value also exposed, I would then suggest to make this the base implementation that is also exposed to the user, and that the interface taking the character "just" handles the conversion and then call the one taking an integer value.
Here is a draft/hacky implementation of
normI'd like to discuss a bit before drafting the PR.
looks good so far!
- At the moment, we have the ℓ2 norm (default), the ℓ1 norm as well as the infinity norm. In my original post, I discussed the possibility also to have a generic p-norm. Since
whichis of typecharacter(len=*), we would need to check that it does not correspond to a case already handled by the dispatch, thatwhichis one character-long (assuming we only consider integer p), convert the character to afloat(either single or double) and then compute ∑|xi|pp. This seems unnecessarily complicated for a relatively edge case. A simple option however would be to have a dedicatedstdlib_linalg_pnorm_${ri}$function for whichwhichis actually a non-negativefloatorintegerand put it under the same interface as `stdlib_linalg_norm_${ri}$. What would be your take on this?
What I like the most is to have a separate function to do the option parsing, so the input could be either integer or character, as @jalvesz suggested. Something like:
pure subroutine parse_norm(which, norm_type, err)
character(*), intent(in) :: which
integer(ilp), intent(out) :: norm_type
type(linalg_state_type), intent(out) :: err
select case (to_lower(which))
case ("euclidean","2","l2")
norm_type = NORM_TYPE_2
case ("inf","infinity")
norm_type = NORM_TYPE_INF
case default
p = to_num(which, p)
if (p>2 .and. p<=NORM_TYPE_MAX) then
norm_type = p
else
norm_type = NORM_TYPE_ERROR
err = linalg_state_type("norm",LINALG_VALUE_ERROR,"invalid norm type: ",which)
endif
end select
end subroutine
The rule could be to define parameters that are >1 for p-norms and <=0 for ther norms, i.e.:
integer(ilp), parameter :: NORM_TYPE_ERROR = -huge(0_ilp)
integer(ilp), parameter :: NORM_TYPE_FROB = -1
integer(ilp), parameter :: NORM_TYPE_INF = -2
integer(ilp), parameter :: NORM_TYPE_L1 = 1
integer(ilp), parameter :: NORM_TYPE_L2 = 2
integer(ilp), parameter :: NORM_TYPE_MAX = 100
- Second point has to do with the 1-norm computation. Is there a more
fypp-esque way to handle the different types? The problem I see is that whileasumis defined for bothrealandcomplex,sum1only is defined forcomplex. I hesitate on whether I should keep the code as it is or if we should create an interface forasum, as well as one forsum1which under the hood also callsasumifxis real. Basically something like this (don't pay attention to the syntax)
For the BLAS/LAPACK functions that have different names for real/complex types, I've been using inline fypp declarations, i.e.:
https://github.com/fortran-lang/stdlib/blob/d996e436d38d8aa8b47749cc3d93d02793d9ae2f/src/stdlib_linalg_svd.fypp#L266
- Last point finally is that I'm not sure how to properly handle the case where
whichis none of the cases covered by the dispatch (e.g. user made a typo).
You can look at the linalg_state_type model, that is the common way that all linear algebra routines handle exceptions, this is typically an optional, intent(out) argument that is internally handlead by a final call to:
https://github.com/fortran-lang/stdlib/blob/d996e436d38d8aa8b47749cc3d93d02793d9ae2f/src/stdlib_linalg_svd.fypp#L287
This forces the code to either return an error code (hopefully LINALG_SUCCESS) to the user, or stop the program on an exception.