sprs icon indicating copy to clipboard operation
sprs copied to clipboard

Safety implications of unsorted indices

Open mulimoen opened this issue 3 years ago • 5 comments

This is a discussion issue for how to handle unsorted indices in arrays and matrices. The situation as it is today is relaxed, we try to avoid it, having unsorted indices won't cause safety issues, but might produce unexpected results/panics in some algorithms.

The primary motivation for bringing this up is to allow/disallow future optimisations to take advantage of the sorted property, and how the library should behave when these invariants are broken.

Should we formalise this behaviour and document it or should we disallow creation of unsorted structures?

Ref #231, #35, #16, #136

mulimoen avatar Sep 26 '20 10:09 mulimoen

As you pointed out, there are two separate concerns here:

  • actual memory safety. We want to be able to rely on CsMat invariants to accelerate the more critical algorithms with eg unchecked array accesses. For the moment, I have not seen an algorithm that relies on the sorted indices property to be memory safe, the important property is having the indices in bounds.
  • correctness. Some algorithms rely on the sorted property to work, for instance finding an existing non-zero using dichotomic search.
  • performance. I've always assumed the sorted property could help some algorithms to perform better, but I'm not sure I've seen that many algorithms.

Here are the possible ways forward in my opinion, including the options you mentioned:

  • Allow unsorted indices, but document algorithms that won't work in that case. Pros: less checks when creating a sparse matrix. Cons: unexpected failures, hard to debug.
  • Disallow the creation of unsorted structures. Pros: this is the closest to the current situation, except bugs like #231. Cons: makes the library less flexible, adds some runtime checks that can be costly. Though as you showed in #228 it is way less expensive to check than to sort (O(n) vs O(nln(n)), and we already need O(n) checks for safety).
  • Track the sorted state, either dynamically, or statically (#16). Pros: maximum flexibility. Cons: less ergonomic library, particularly if we use the type system to track this state. Maybe adding an has_sorted_indices: bool field to the matrix would be enough. Then algorithms that need sorted indices would be able to fail, or sort, when they get unsorted indices.

Do you see other alternatives?

vbarrielle avatar Sep 26 '20 11:09 vbarrielle

The usage of unsafe in this crate is minimal, which is great for memory safety. We should not have problems with this, as long as all indices are in bounds when passing to external bindings.

The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache.

I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing an unsafe constructor. This would also be the least amount of breakage and API change as you mention.

In #35 you mention the need for unsorted indices. Has this changed?

[*]: If we introduce a new slicetype with an offset field

mulimoen avatar Sep 26 '20 12:09 mulimoen

The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache. I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing an unsafe constructor. This would also be the least amount of breakage and API change as you mention.

Yes I think I agree, the sorted indices is a very convenient property in many algorithms. The unsafe constructor is the way to go.

In #35 you mention the need for unsorted indices. Has this changed?

There are some decompositions that can only produce unsorted indices as far as I know. But the case is quite rare, and they don't need that many operations, so when these decompositions get implemented we can define a limited sparse matrix type with unsorted indices to help in these decompositions.

vbarrielle avatar Sep 27 '20 20:09 vbarrielle

It seems to me like some algorithms requiring sorted indices and some not could be handled with a private is_sorted Field, which is assumed to be correctly set. When creating a matrix from raw, there could be three options:

  • from_raw Which checks if the array is sorted and sets is_sorted accordingly.
  • from_unsorted_raw which performs no check and sets is_sorted to false
  • unsafe from_sorted_raw_unchecked which performs no check and sets is_sorted to true

Additionally, a method to sort if needed would be useful. Possibly sorted_view And sort Methods?

maboesanman avatar Oct 01 '20 12:10 maboesanman

Just for reference the C++ lib eigen has similar questions: https://gitlab.com/libeigen/eigen/-/issues/694#note_254554694

vbarrielle avatar Oct 06 '20 14:10 vbarrielle