SparseArrays.jl
SparseArrays.jl copied to clipboard
Deprecate field access of SparseMatrixCSC?
In JuliaLang/julia#33054 I proposed to expose AbstractSparseMatrixCSC interface so that it is possible for package developers to easily write SparseMatrixCSC-like custom matrices (without worrying about implementing all the complex functions including the broadcasting machineries). @ViralBShah was asking in https://github.com/JuliaLang/julia/pull/32953#issuecomment-524173844 if it makes sense to deprecate field access. The rationale is that it pushes package authors to use AbstractSparseMatrixCSC interface so that subtypes of it other than SparseArrays.SparseMatrixCSC would work for their code. This is technically OK since those fields are not a part of the public API. (Edit: It may not be considered completely private. See: https://github.com/JuliaLang/julia/issues/33056#issuecomment-524583635)
As the accessor like nonzeros have exited for more than 4 years since Julia 0.4 (#8720), I'd assume that all existing and maintained code base already have migrated to the accessor methods. If that's the case, this deprecation would not be very destructive (although the effect would be small at the same time).
As a side note, I added getproperty definitions for SparseMatrixCSC and SparseVector in test suite in JuliaLang/julia#32953 so that we can enforce don't-use-fields rule in the future development of SparseArrays.jl itself. So, enforcing the rule inside SparseArrays.jl is not a strong enough argument to ban field access for users.
(ping @fredrikekre as you reacted in the previous discussion)
Thinking about this a bit after the previous discussion https://github.com/JuliaLang/julia/pull/32953#issuecomment-524173844, I'm now somewhat against deprecating or banning the field access. It is very trivial for new AbstractSparseMatrixCSC subtypes to implement SparseMatrixCSC-compatible getproperty. So, making old code base compatible with (SparseMatrixCSC-compatible) AbstractSparseMatrixCSC subtypes is as easy as relaxing the type constraints. Considering there is unknown amount of public and private code bases using SparseMatrixCSC I don't think it would make sense to break them just for pinging them that we now have AbstractSparseMatrixCSC.
I would personally be in favour of deprecating and generally having everyone use accessor methods for all sparse data structures going forward.
But, at the same time, we probably have many other things to do in sparse matrix land and this is perhaps not the most pressing issue. While the implementation of this is straightforward, it may cause quite a bit of follow-up work as it propagates through the packages.
How about we leave this issue open to collect comments for now (while we focus our attention on other things)?
+1 for leaving it open and focusing on other issues. We can come back to this anytime later.
Actually, the fields of SparseMatrixCSC are mentioned in the documentation
https://github.com/JuliaLang/julia/blob/da9685b516c2e7ed7689dd8c803d8188d6e1f3a1/stdlib/SparseArrays/docs/src/index.md#L12-L27
It explicitly says "internal representation" so I don't know if it is considered a public API though.
I believe those are documented for the purpose of explanation. Internal representation certainly means "do not count on these".
The only way to use and modify SparseMatrixCSC efficiently in many situations is to use these field so they are de facto public. Just removing/changing their name is likely to be very disruptive to the ecosystem (grepping through packages for .colptr etc should make this clear)
There are like five issues/PRs discussing changing the sparse array APIs at this point. And the one that Simon opened with the clearest plan has been closed in favor of a PR with an unclear plan and breaking changes that we definitely cannot make. Can we reopen the original plan issue and actually come up with a coherent plan before making any more half-baked PRs changing things?
The one that Simon had was not about API but rationalization of field names, which is something we are clearly not doing. I don't see the point about opening up that discussion again.
What half-baked PRs are you talking about? It can be confusing to follow along since the work is spread out across several PRs, but I don't see how you can label these contributions as half-baked.
There are like five issues/PRs discussing changing the sparse array APIs at this point.
That's why I closed JuliaLang/julia#33050 in favor of JuliaLang/julia#33054 (see https://github.com/JuliaLang/julia/pull/33050#issuecomment-524616744). I thought to keep opening this issue makes sense as JuliaLang/julia#33054 is about API addition, not deletion.
Simon opened with the clearest plan has been closed in favor of a PR with an unclear plan and breaking changes that we definitely cannot make.
There is nothing breaking in JuliaLang/julia#33054 at all. Or at least that has been my intention. Please comment in JuliaLang/julia#33054 if you find anything breaking.
I apologize about calling your PRs half-baked. It was uncalled for and shitty of me. I appreciate all your work on these things and don’t want to discourage it. I’m frustrated about a few things:
- That the sparse APIs didn’t get more attention pre-1.0, which is obviously not your fault and is water under the bridge at this point;
- That we had a pretty solid plan a year ago that didn’t get any traction or much attention;
- That there are too many PRs making breaking changes that we definitely can’t merge.
- That sparse and linear algebra in general has a distinct lack of maintenance and leadership for the past year or so.
- Because sparse and linalg are still stdlibs, this had become a pretty major burden for language development: there are more stalled out sparse and linalg issues and PRs than any another category of issue/PR on the repo.
My proposal: either reopen Simon’s issue proposing better names or create a new issue (not a PR) to discuss what to do about the sparse APIs; come to some kind of agreement on that issue and only then set about executing the plan.
I do not think that we can break existing code that accesses fields, but we can rename fields and provide getproperty methods that allow the old names to continue to work. So pick the names you want, change them to those, make the old ones work, update the docs.
I have reopened https://github.com/JuliaLang/julia/issues/25118 that @simonbyrne opened about fieldnames for SparseMatrixCSC.
@StefanKarpinski I think I understand your frustration about linear algebra / sparse arrays code base. I imagine many maintainers and contributors share similar feelings. (Off topic, but it would be nice if these stdlibs can go to Pkg-like development mode which may mitigate some of the issues.)
I summarize the recent activities related to SparseMatrixCSC in the issue opened by @simonbyrne https://github.com/JuliaLang/julia/issues/25118#issuecomment-524654446. It would be nice if you can share thoughts about it there.