heat
heat copied to clipboard
Distributed Compressed Sparse Column Matrix
Due Diligence
- General:
- [x] title of the PR is suitable to appear in the Release Notes
- Implementation:
- [x] unit tests: all split configurations tested
- [x] unit tests: multiple dtypes tested
- [x] documentation updated where needed
Description
A new format similar to the one implemented in https://github.com/helmholtz-analytics/heat/pull/1028.
Most code from the previous implementation is common for both formats.
Steps:
- [x] Implement a common base class
- [x] Factory methods
- [x] Sparse and Dense formats interconversion
- [x] Arithmetic Operations
- [x] Tests
NOTE: In the future, when arithmetic operations are supported by torch
for sparse_csc
, they can be enabled for DCSC_matrix
by restoring the code removed in commit https://github.com/helmholtz-analytics/heat/pull/1377/commits/6d727affd5c9cff63f84896e2c8e77fcfe042335. The binary operator has already been modified to be generic for both DCSR_matrix
and DCSC_matrix
.
Type of change
New feature
I just discovered something that I should have seen a long time ago.
torch
only officially supports the matmul
operation for its CSR and CSC formats. (https://pytorch.org/docs/stable/sparse.html#csr-tensor-operations)
The elementwise operations (add
and mul
) are implemented for CSR(although not mentioned in their docs) but not for CSC.
It is still possible for us to implement matmul
for ht.sparse
module. Just wanted to know how we should proceed.
Thank you for the PR!
Thank you for the PR!
Codecov Report
Attention: Patch coverage is 92.85714%
with 9 lines
in your changes missing coverage. Please review.
Project coverage is 91.74%. Comparing base (
ee0d72a
) to head (90dc1d6
). Report is 325 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
heat/sparse/_operations.py | 78.94% | 4 Missing :warning: |
heat/sparse/dcsx_matrix.py | 93.02% | 3 Missing :warning: |
heat/sparse/factories.py | 96.96% | 1 Missing :warning: |
heat/sparse/manipulations.py | 95.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1377 +/- ##
==========================================
- Coverage 91.76% 91.74% -0.03%
==========================================
Files 80 80
Lines 11640 11683 +43
==========================================
+ Hits 10682 10719 +37
- Misses 958 964 +6
Flag | Coverage Δ | |
---|---|---|
unit | 91.74% <92.85%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I just discovered something that I should have seen a long time ago.
torch
only officially supports thematmul
operation for its CSR and CSC formats. (https://pytorch.org/docs/stable/sparse.html#csr-tensor-operations) The elementwise operations (add
andmul
) are implemented for CSR(although not mentioned in their docs) but not for CSC.It is still possible for us to implement
matmul
forht.sparse
module. Just wanted to know how we should proceed.
@Mystic-Slice thank you so much for this and apologies for the delay.
It's OK not to provide the element-wise operations for CSC if torch doesn't implement them yet.
The basic operation we want to be able to provide with this class, is matrix multiplication CSR @ CSC, when CSR is distributed (split=0) and CSC is not distributed. This operation requires no MPI communication, returns a distributed CSR. It would be a great starting point, and it doesn't need to be part of this PR.
Is this PR, the CSC class, ready for review @Mystic-Slice ?
Thanks again, we highly appreciate your contribution!
Thank you for the PR!
Hi @ClaudiaComito Sorry for the delay from my side too. The implementation of DCSC_matrix is complete. I just need to write a few tests. I will complete them asap.
If tests aren't a blocker for review, we can proceed with that.
Thank you for the PR!
Hi @ClaudiaComito Sorry for the delay from my side too. The implementation of DCSC_matrix is complete. I just need to write a few tests. I will complete them asap.
If tests aren't a blocker for review, we can proceed with that.
Hi @Mystic-Slice, thanks, the tests make the review so much easier, so please go ahead and sketch the tests first. Thanks a lot!
Thank you for the PR!
Thank you for the PR!
@ClaudiaComito The tests are done.
PR is ready for review
In the future, when arithmetic operations are supported by torch
for sparse_csc
, they can be enabled for DCSC_matrix
by restoring the code removed in commit https://github.com/helmholtz-analytics/heat/commit/6d727affd5c9cff63f84896e2c8e77fcfe042335. The binary operator has already been modified to be generic for both DCSR_matrix
and DCSC_matrix
.
Waiting on #1443 (solves test_random
failure on AMD runner among other things)
Thank you for the PR!
Thank you for the PR!
@Mystic-Slice on the CUDA runner, with PyTorch 2.2, we get the following error:
___________ ERROR collecting heat/sparse/tests/test_manipulations.py ___________
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/runner.py:341: in from_call
result: Optional[TResult] = func()
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/runner.py:372: in <lambda>
call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/python.py:531: in collect
self._inject_setup_module_fixture()
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/python.py:545: in _inject_setup_module_fixture
self.obj, ("setUpModule", "setup_module")
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/python.py:310: in obj
self._obj = obj = self._getobj()
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/python.py:528: in _getobj
return self._importtestmodule()
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/python.py:617: in _importtestmodule
mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/pathlib.py:565: in import_path
importlib.import_module(module_name)
/opt/conda/envs/heat_dev/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1050: in _gcd_import
???
<frozen importlib._bootstrap>:1027: in _find_and_load
???
<frozen importlib._bootstrap>:992: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
???
<frozen importlib._bootstrap>:1050: in _gcd_import
???
<frozen importlib._bootstrap>:1027: in _find_and_load
???
<frozen importlib._bootstrap>:1006: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:688: in _load_unlocked
???
<frozen importlib._bootstrap_external>:883: in exec_module
???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
???
heat/sparse/tests/__init__.py:1: in <module>
from .test_arithmetics_csr import *
<frozen importlib._bootstrap>:1027: in _find_and_load
???
<frozen importlib._bootstrap>:1006: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:688: in _load_unlocked
???
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
exec(co, module.__dict__)
heat/sparse/tests/test_arithmetics_csr.py:4: in <module>
import heat as ht
heat/__init__.py:20: in <module>
from . import utils
heat/utils/__init__.py:5: in <module>
from . import data
heat/utils/data/__init__.py:7: in <module>
from . import mnist
heat/utils/data/mnist.py:7: in <module>
from torchvision import datasets
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/torchvision/__init__.py:6: in <module>
from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/torchvision/_meta_registrations.py:26: in <module>
def meta_roi_align(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio, aligned):
/opt/conda/envs/heat_dev/lib/python3.10/site-packages/torchvision/_meta_registrations.py:18: in wrapper
if torchvision.extension._has_ops():
E AttributeError: partially initialized module 'torchvision' has no attribute 'extension' (most likely due to a circular import)
Many test files return the same error. But we can't reproduce it with other PRs. Can you check it out? The ROCm runner runs fine, by the way.
I was able to recreate this error on my local machine.
It occurs when the torch and torchvision version do not match.
In my case, specifically, torch==2.2
and torchvision==0.18.0
(which corresponds to torch==2.3
).
When I made the versions match, they work perfectly fine.
I believe the torch version on cuda runner is 2.2 and not 2.3. But I am not sure.
Is there a way to check that?
I was able to recreate this error on my local machine. It occurs when the torch and torchvision version do not match. In my case, specifically,
torch==2.2
andtorchvision==0.18.0
(which corresponds totorch==2.3
). When I made the versions match, they work perfectly fine.I believe the torch version on cuda runner is 2.2 and not 2.3. But I am not sure.
Is there a way to check that?
~~@mtar can you look into this? thanks~~ never mind
Thank you for the PR!
@Mystic-Slice this looks amazing. Will you add your name to the CITATION.cff file, under the # release contributors - add as needed
header? Thanks!
Done!
Thank you for the PR!
Thank you for the PR!
@Mystic-Slice one single test fails with torch<2, see below. From my side it's absolutely OK to skip the test when torch<2
_________________________ TestDCSC_matrix.test_astype __________________________
self = <heat.sparse.tests.test_dcscmatrix.TestDCSC_matrix testMethod=test_astype>
def test_astype(self):
heat_sparse_csc = ht.sparse.sparse_csc_matrix(self.ref_torch_sparse_csc)
# check starting invariant
self.assertEqual(heat_sparse_csc.dtype, ht.float32)
# check the copy case for uint8
> as_uint8 = heat_sparse_csc.astype(ht.uint8)
heat/sparse/tests/test_dcscmatrix.py:273:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = , dtype = <class 'heat.core.types.uint8'>, copy = True
def astype(self, dtype, copy=True) -> __DCSX_matrix:
"""
Returns a casted version of this matrix.
Casted matrix is a new matrix of the same shape but with given type of this matrix. If copy is ``True``, the
same matrix is returned instead.
Parameters
----------
dtype : datatype
HeAT type to which the matrix is cast
copy : bool, optional
By default the operation returns a copy of this matrix. If copy is set to ``False`` the cast is performed
in-place and this matrix is returned
"""
dtype = canonical_heat_type(dtype)
> casted_matrix = self._array.type(dtype.torch_type())
E RuntimeError: torch.copy_: only sparse compressed tensors with the same number of specified elements are supported.
heat/sparse/dcsx_matrix.py:309: RuntimeError
@ClaudiaComito
referred https://discuss.pytorch.org/t/why-does-pytorch-needs-the-three-functions-to-type-and-type-as/20164
Seems like the .type()
doesnt support conversion of a lot of Tensor types. I should have used .to()
method instead.
Made the change.
Thank you for the PR!
Seems like it didnt work
I will just skip the test for torch < 2.0
.
Sorry for the delay. I thought the previous fix would work.