SparseArrays.jl
SparseArrays.jl copied to clipboard
Rename fields of `SparseMatrixCSC`
Currently the 5 fields are named:
mncolptrrowvalnzval
My complaints are:
- "row" and "col" are used in two of the fields, but not their respective sizes (
mandn) rowvalis confusing (what are "row values"?)colptr: isn't a pointer a C thing? we don't use it anywhere else in the languagenzval: don't we now allow zeros in the non-sparse elements?
I would propose
nrowncolcoloffsetsrowindicesvalues
I wholeheartedly sympathize. Could we postpone this discussion till after the release though? I have been meaning to write up some thoughts on the matter, but cannot spare the time for the foreseeable future :). Best!
Yes, once we have getfield overloading, any changes here should be non-breaking.
Yes, once we have getfield overloading, any changes here should be non-breaking.
That upside of getfield overloading had not occurred to me. Fantastic! π
That upside of getfield overloading had not occurred to me. Fantastic! π
Oh man, that's one of the main benefits βΒ it decouples "virtual fields" from actual fields.
nrow-> rows?
why not row_num and col_num?
I would call it nrows; this is a fairly common naming pattern for us already, ndims, etc.
Linking https://github.com/JuliaLang/julia/issues/24910#issuecomment-349699740. Best!
Are these fields meant to be public or private? Before this very essential question is answered its difficult to discuss.
m and n are definitely private fields. One uses size to get the sizes. IMHO also all other fields are private. There is a publicrowvals function, i.e. it is clear that .rowvals is private. But it seems that a corresponding method for colptr is missing.
That upside of getfield overloading had not occurred to me. Fantastic!
In principle yes, but I am pretty sure that you don't want to use getfield overloading to maintain compatibility of internal interfaces.
Currently, you need to use the fields to do almost any efficient sparse matrix manipulations. So they are de facto public.
Could you make this please more precise? Which fields are public? -> We really need a clear communication (in the documentation) here.
I'm fine with changing the names but it would better if we could generally use getters as in https://github.com/JuliaLang/julia/blob/292a3b4c73b91bcb3dbe2d9d7c56f8d747ce5207/base/sparse/sparsematrix.jl#L48-L53 since it would allow many algorithms to work efficiently for views of full columns of sparse matrices.
The sparse higher order functions code takes a similar approach https://github.com/JuliaLang/julia/blob/292a3b4c73b91bcb3dbe2d9d7c56f8d747ce5207/base/sparse/higherorderfns.jl#L31 to provide a uniform interface to sparse vectors and matrices, and perhaps other (e.g. structured) objects as well going forward. My thinking on this front has been motivated by cleaning that interface up. Best!
That would be fine, but those functions are neither exported nor documented.
I think that for now we should consider these fields to be private even though library code has been effectively forced to use them in order to write faster sparse code. Change the names to better names and also provide an official API for accessing what one needs to write fast sparse code. This may cause some breakage; we can either use the new getproperty feature to make the old field names work (possibly with deprecations) or we can just proactively fix broken uses of these fields.
Hi, I'm proposing a new AbstractSparseMatrixCSC interface in PR JuliaLang/julia#33054. We need to document the accessor functions so people here may be interested in the discussion there.
I am going to close this issue and request discussion in JuliaLang/julia#33054. This is an opportunity to use better names for the accessors.
Let me summarize the recent changes I made related to this issue:
In JuliaLang/julia#30173, @ViralBShah and I were discussing the way to expose SparseMatrixCSC functionality in such a way that it is easily extensible by the users. We came to the agreement that defining SparseMatrixCSC functions in terms of accessor methods and abstract type is the way to go.
In JuliaLang/julia#32953 and JuliaLang/julia#33039 I did the refactoring so that most of the code touching SparseMatrixCSC is now defined in terms of AbstractSparseMatrixCSC and its accessor functions. At this point, nothing is breaking (or at least that's my intention) and pre-existing accessor functions (e.g. getcolptr and rowvals) are used as-is.
Now in JuliaLang/julia#33054 we are discussing how to document the interface of AbstractSparseMatrixCSC (e.g., accessor functions and assumed invariance of the underlying data). As getcolptr is not exported and documented, it is a good chance to use a better name. Accessor function coloffsets(::AbstractSparseMatrixCSC) sounds like a very good option.
A more controversial part of the discussion is that if we should add new accessor functions like rowindices and declare the old existing accessor functions like rowvals to be aliases of them (https://github.com/JuliaLang/julia/pull/33054#discussion_r317382051). It is totally backward compatible change. The benefit of doing so now is that the code written after Julia 1.4 can be forward compatible with Julia 2.0 if we do this now (especially the subtypes of AbstractSparseMatrixCSC). Of course, we need to consider the balance of the pros and cons. I can imagine that there will be some confusions due to the coexistence of old and new APIs. IIUC, that's the main concern @ViralBShah has.
Questions I'd like to ask here are:
-
What accessor function should we use for
colptr? Should it becoloffsets? Should we just document and exportgetcolptr? -
Should we introduce new accessors for
rowvalandnzval? They are calledrowvalsandnonzerosnow but they can be better. Alternative new APIs arerowindicesproposed by @simonbyrne in the OP andstoredvalsproposed by @ViralBShah https://github.com/JuliaLang/julia/pull/33054#discussion_r317381459.
Note that m and n are accessible with size(A) and size(A, i). I suppose there will be no argument against it.
Cross ref. https://github.com/JuliaLang/julia/issues/31330#issuecomment-473575718 here too :).
Thanks. So @Sacha0, @ViralBShah, and I all seem to think that it makes sense to use storedindices or storeinds for SparseVector if nonzeros is renamed to storedvals.
@Sacha0's comment also mentions SparseMatrixCSR. I don't know if we want to take immediate action but we can also rename nzrange to something to be consistent with the notion of "stored" vs "zeros" and also maybe make sure to use a name that indicates that the first argument is CSC. ref: JuliaLang/SparseArrays.jl#50
That has been the direction we have been moving in.
So should we adopt the new names, and support the old names with properties (until 2.0 rolls around)? The accessors can also be appropriately named then - and the deprecations can be carried out whenever 2.0 comes around.
The only absolute blocker for AbstractSparseMatrixCSC JuliaLang/julia#33054 is the name of accessor function for field .colptr. Other accessor functions already exist as public API so we have to keep (at least) aliases anyway.
Can this be discussed in triage? I think the desired outcome is one of the following:
-
Triage decides the name of the public accessor function for field
.colptr. Candidates are:coloffsetsgetcolptr(existing but undocumented internal function)
-
Triage delegates the decision to members whoever active in this issue.