SparseArrays.jl icon indicating copy to clipboard operation
SparseArrays.jl copied to clipboard

Rename fields of `SparseMatrixCSC`

Open simonbyrne opened this issue 7 years ago β€’ 24 comments

Currently the 5 fields are named:

  • m
  • n
  • colptr
  • rowval
  • nzval

My complaints are:

  1. "row" and "col" are used in two of the fields, but not their respective sizes (m and n)
  2. rowval is confusing (what are "row values"?)
  3. colptr: isn't a pointer a C thing? we don't use it anywhere else in the language
  4. nzval: don't we now allow zeros in the non-sparse elements?

I would propose

  • nrow
  • ncol
  • coloffsets
  • rowindices
  • values

simonbyrne avatar Dec 15 '17 21:12 simonbyrne

I wholeheartedly sympathize. Could we postpone this discussion till after the release though? I have been meaning to write up some thoughts on the matter, but cannot spare the time for the foreseeable future :). Best!

Sacha0 avatar Dec 15 '17 21:12 Sacha0

Yes, once we have getfield overloading, any changes here should be non-breaking.

simonbyrne avatar Dec 15 '17 21:12 simonbyrne

Yes, once we have getfield overloading, any changes here should be non-breaking.

That upside of getfield overloading had not occurred to me. Fantastic! πŸ˜„

Sacha0 avatar Dec 15 '17 21:12 Sacha0

That upside of getfield overloading had not occurred to me. Fantastic! πŸ˜„

Oh man, that's one of the main benefits – it decouples "virtual fields" from actual fields.

StefanKarpinski avatar Dec 15 '17 22:12 StefanKarpinski

nrow-> rows?

dpsanders avatar Dec 16 '17 00:12 dpsanders

why not row_num and col_num?

oscardssmith avatar Dec 16 '17 00:12 oscardssmith

I would call it nrows; this is a fairly common naming pattern for us already, ndims, etc.

StefanKarpinski avatar Dec 16 '17 03:12 StefanKarpinski

Linking https://github.com/JuliaLang/julia/issues/24910#issuecomment-349699740. Best!

Sacha0 avatar Dec 16 '17 18:12 Sacha0

Are these fields meant to be public or private? Before this very essential question is answered its difficult to discuss.

m and n are definitely private fields. One uses size to get the sizes. IMHO also all other fields are private. There is a publicrowvals function, i.e. it is clear that .rowvals is private. But it seems that a corresponding method for colptr is missing.

That upside of getfield overloading had not occurred to me. Fantastic!

In principle yes, but I am pretty sure that you don't want to use getfield overloading to maintain compatibility of internal interfaces.

tknopp avatar Dec 17 '17 08:12 tknopp

Currently, you need to use the fields to do almost any efficient sparse matrix manipulations. So they are de facto public.

KristofferC avatar Dec 17 '17 08:12 KristofferC

Could you make this please more precise? Which fields are public? -> We really need a clear communication (in the documentation) here.

tknopp avatar Dec 17 '17 09:12 tknopp

I'm fine with changing the names but it would better if we could generally use getters as in https://github.com/JuliaLang/julia/blob/292a3b4c73b91bcb3dbe2d9d7c56f8d747ce5207/base/sparse/sparsematrix.jl#L48-L53 since it would allow many algorithms to work efficiently for views of full columns of sparse matrices.

andreasnoack avatar Dec 17 '17 18:12 andreasnoack

The sparse higher order functions code takes a similar approach https://github.com/JuliaLang/julia/blob/292a3b4c73b91bcb3dbe2d9d7c56f8d747ce5207/base/sparse/higherorderfns.jl#L31 to provide a uniform interface to sparse vectors and matrices, and perhaps other (e.g. structured) objects as well going forward. My thinking on this front has been motivated by cleaning that interface up. Best!

Sacha0 avatar Dec 17 '17 18:12 Sacha0

That would be fine, but those functions are neither exported nor documented.

simonbyrne avatar Dec 17 '17 21:12 simonbyrne

I think that for now we should consider these fields to be private even though library code has been effectively forced to use them in order to write faster sparse code. Change the names to better names and also provide an official API for accessing what one needs to write fast sparse code. This may cause some breakage; we can either use the new getproperty feature to make the old field names work (possibly with deprecations) or we can just proactively fix broken uses of these fields.

StefanKarpinski avatar Dec 17 '17 22:12 StefanKarpinski

Hi, I'm proposing a new AbstractSparseMatrixCSC interface in PR JuliaLang/julia#33054. We need to document the accessor functions so people here may be interested in the discussion there.

tkf avatar Aug 24 '19 20:08 tkf

I am going to close this issue and request discussion in JuliaLang/julia#33054. This is an opportunity to use better names for the accessors.

ViralBShah avatar Aug 25 '19 04:08 ViralBShah

Let me summarize the recent changes I made related to this issue:

In JuliaLang/julia#30173, @ViralBShah and I were discussing the way to expose SparseMatrixCSC functionality in such a way that it is easily extensible by the users. We came to the agreement that defining SparseMatrixCSC functions in terms of accessor methods and abstract type is the way to go.

In JuliaLang/julia#32953 and JuliaLang/julia#33039 I did the refactoring so that most of the code touching SparseMatrixCSC is now defined in terms of AbstractSparseMatrixCSC and its accessor functions. At this point, nothing is breaking (or at least that's my intention) and pre-existing accessor functions (e.g. getcolptr and rowvals) are used as-is.

Now in JuliaLang/julia#33054 we are discussing how to document the interface of AbstractSparseMatrixCSC (e.g., accessor functions and assumed invariance of the underlying data). As getcolptr is not exported and documented, it is a good chance to use a better name. Accessor function coloffsets(::AbstractSparseMatrixCSC) sounds like a very good option.

A more controversial part of the discussion is that if we should add new accessor functions like rowindices and declare the old existing accessor functions like rowvals to be aliases of them (https://github.com/JuliaLang/julia/pull/33054#discussion_r317382051). It is totally backward compatible change. The benefit of doing so now is that the code written after Julia 1.4 can be forward compatible with Julia 2.0 if we do this now (especially the subtypes of AbstractSparseMatrixCSC). Of course, we need to consider the balance of the pros and cons. I can imagine that there will be some confusions due to the coexistence of old and new APIs. IIUC, that's the main concern @ViralBShah has.

tkf avatar Aug 25 '19 18:08 tkf

Questions I'd like to ask here are:

  1. What accessor function should we use for colptr? Should it be coloffsets? Should we just document and export getcolptr?

  2. Should we introduce new accessors for rowval and nzval? They are called rowvals and nonzeros now but they can be better. Alternative new APIs are rowindices proposed by @simonbyrne in the OP and storedvals proposed by @ViralBShah https://github.com/JuliaLang/julia/pull/33054#discussion_r317381459.

Note that m and n are accessible with size(A) and size(A, i). I suppose there will be no argument against it.

tkf avatar Aug 25 '19 18:08 tkf

Cross ref. https://github.com/JuliaLang/julia/issues/31330#issuecomment-473575718 here too :).

Sacha0 avatar Aug 26 '19 16:08 Sacha0

Thanks. So @Sacha0, @ViralBShah, and I all seem to think that it makes sense to use storedindices or storeinds for SparseVector if nonzeros is renamed to storedvals.

@Sacha0's comment also mentions SparseMatrixCSR. I don't know if we want to take immediate action but we can also rename nzrange to something to be consistent with the notion of "stored" vs "zeros" and also maybe make sure to use a name that indicates that the first argument is CSC. ref: JuliaLang/SparseArrays.jl#50

tkf avatar Aug 27 '19 02:08 tkf

That has been the direction we have been moving in.

ViralBShah avatar Aug 27 '19 02:08 ViralBShah

So should we adopt the new names, and support the old names with properties (until 2.0 rolls around)? The accessors can also be appropriately named then - and the deprecations can be carried out whenever 2.0 comes around.

ViralBShah avatar Aug 31 '19 07:08 ViralBShah

The only absolute blocker for AbstractSparseMatrixCSC JuliaLang/julia#33054 is the name of accessor function for field .colptr. Other accessor functions already exist as public API so we have to keep (at least) aliases anyway.

Can this be discussed in triage? I think the desired outcome is one of the following:

  • Triage decides the name of the public accessor function for field .colptr. Candidates are:

    • coloffsets
    • getcolptr (existing but undocumented internal function)
  • Triage delegates the decision to members whoever active in this issue.

tkf avatar Nov 22 '19 05:11 tkf