heat icon indicating copy to clipboard operation
heat copied to clipboard

Distributed Compressed Sparse Row Matrix

Open Mystic-Slice opened this issue 2 years ago • 3 comments

Description

Distributed Compressed Sparse Row Matrix: Dcsr_matrix

A format for the efficient storage and manipulation of sparse data (with majority 0s). This distributed implementation builds upon the torch.sparse_csr_tensor which is used as the process local storage. It supports distribution along the axis 0 (rows). Other axes are omitted since they do not work well with this format. API closely mimics the scipy sparse library.

ht.sparse.sparse_csr_matrix is the sparse alternative of ht.array method. It takes either torch.sparse_csr_tensor or scipy.sparse.csr_matrix as input and generates a Dcsr_matrix.

Has a working binary operator for element-wise operations. Currently only supports addition and multiplication. Also, only the float datatype is supported in these operations due to the use of torch.sparse_csr_tensors.

Can be converted to the dense format (a DNDarray) using the todense method.

Further work:

  1. Exhaustive tests
  2. Other element-wise operations like bitwise and, or, etc...
  3. Matrix multiplication

Project Description: GSoC Project Idea - 2

Type of change

  • New feature

Due Diligence

  • [ ] All split configurations tested
  • [ ] Multiple dtypes tested in relevant functions
  • [ ] Documentation updated (if needed)
  • [ ] Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

yes / no

skip ci

Mystic-Slice avatar Sep 17 '22 15:09 Mystic-Slice

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

ghost avatar Sep 17 '22 15:09 ghost

Failed test on one process

=================================== FAILURES ===================================
_________________________ TestDcsr_matrix.test_larray __________________________
self = <heat.sparse.tests.test_dcsrmatrix.TestDcsr_matrix testMethod=test_larray>
    def test_larray(self):
        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
    
        self.assertIsInstance(heat_sparse_csr.larray, torch.Tensor)
        self.assertEqual(heat_sparse_csr.larray.layout, torch.sparse_csr)
        self.assertEqual(heat_sparse_csr.larray.shape, heat_sparse_csr.lshape)
        self.assertEqual(heat_sparse_csr.larray.shape, heat_sparse_csr.gshape)
    
        # Distributed case
        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
    
        self.assertIsInstance(heat_sparse_csr.larray, torch.Tensor)
        self.assertEqual(heat_sparse_csr.larray.layout, torch.sparse_csr)
        self.assertEqual(heat_sparse_csr.larray.shape, heat_sparse_csr.lshape)
>       self.assertNotEqual(heat_sparse_csr.larray.shape, heat_sparse_csr.gshape)
E       AssertionError: torch.Size([5, 5]) == (5, 5)
heat/sparse/tests/test_dcsrmatrix.py:44: AssertionError

mtar avatar Sep 23 '22 07:09 mtar

Codecov Report

Merging #1028 (6d45bd4) into main (9cce973) will increase coverage by 0.00%. The diff coverage is 91.76%.

@@           Coverage Diff            @@
##             main    #1028    +/-   ##
========================================
  Coverage   91.75%   91.76%            
========================================
  Files          65       72     +7     
  Lines       10024    10352   +328     
========================================
+ Hits         9198     9499   +301     
- Misses        826      853    +27     
Flag Coverage Δ
unit 91.76% <91.76%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
heat/core/_operations.py 96.04% <ø> (ø)
heat/sparse/_operations.py 76.56% <76.56%> (ø)
heat/sparse/dcsr_matrix.py 93.93% <93.93%> (ø)
heat/sparse/factories.py 95.29% <95.29%> (ø)
heat/__init__.py 100.00% <100.00%> (ø)
heat/core/communication.py 96.21% <100.00%> (+0.01%) :arrow_up:
heat/sparse/__init__.py 100.00% <100.00%> (ø)
heat/sparse/arithmetics.py 100.00% <100.00%> (ø)
heat/sparse/manipulations.py 100.00% <100.00%> (ø)
heat/sparse/tests/__init__.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 23 '22 08:09 codecov[bot]

@ClaudiaComito I have made all the changes you requested. Thanks for your review! And, yes. a to_sparse method would be amazing. I will work on it in a separate PR. Hoping that's alright.

Mystic-Slice avatar Nov 15 '22 17:11 Mystic-Slice