heat
heat copied to clipboard
Features/unique sort distributed
Description
This PR introduces major changes in the ht.unique()
implementation, fixing some bugs/inconsistencies along the way (see below).
Changes proposed:
Distributed unique
requires two passes:
- find local sorted unique elements,
- find global sorted unique elements.
The current (v0.5.1) implementation solves step 2. by running torch.unique
again on the gathered local unique elements. This might turn into a memory bottleneck for very large data.
The main implementation change in this PR is that, in the distributed case, ht.unique
now recycles the "pivot sorting" implementation (see ht.sort()
, manipulations._pivot_sorting()
) to perform an Alltoallv
-based sorted unique operation that doesn't require "gathering".
The main user-side changes are as follows:
-
ht.unique
now, like numpy, always returns the SORTED unique elements. -
"sparse" vs. "dense" unique. If the collective size of the local uniques (from step 1) above) is smaller than the size of the local data, then
ht.unique
gathers everything and runs the operation locally. In this case, the unique elements array will havesplit=None
. Otherwise, distributedunique
via_pivot_sorting()
(Alltoallv
) returning a distributed DNDarray. -
inverse indices are now a DNDarray and distributed like the input data. Note that inverse indices are used to recreate the original data shape from the unique elements. However, the sorted unique elements corresponding to a given inverse index might be on a different process. Eventually,
setitem
should be able to deal with this, at the momentunique[inverse]
requires aunique.resplit_(None)
first.
As an aside:
- get
gethalo
to work on imbalanced DNDarrays - get
create_lshape_map
to only require communication for imbalanced DNDarrays - resolve race condition in
test_qr
that has been popping up on and off for ages. -
ADDED 24 NOV 2021:
factories.array
behaviour whencopy=False
now closer tonp.array
(https://numpy.org/doc/stable/reference/generated/numpy.array.html).
Issue/s resolved: #363, #564, #621
Type of change
- Breaking change (fix or feature that would cause existing functionality to not work as expected):
-
ht.unique()
always returns the sorted unique elements, kwargsorted
has been removed - inverse indices are no longer torch tensors, they're now DNDarrays and distributed like the input
-
unique.resplit_(None)
might be required before applying inverse indices -
NEW:
factories.array(copy=False)
does not copy slices of original data unless absolutely necessary (dtype, order etc.)
-
Due Diligence
- [x] All split configurations tested
- [x] Multiple dtypes tested in relevant functions
- [x] Documentation updated (if needed)
- [x] Updated changelog.md under the title "Pending Additions"
Does this change modify the behaviour of other functions? If so, which?
- the possibility to leave the data "unsorted" is not available any longer.
- operations expecting inverse indices to be a local torch tensor will fail.
- operations expecting
ht.unique()
to return a non-distributed DNDarray may fail in some cases.
Codecov Report
Attention: Patch coverage is 98.03150%
with 5 lines
in your changes are missing coverage. Please review.
Project coverage is 95.48%. Comparing base (
35f39d1
) to head (159d1c5
). Report is 1019 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
heat/core/manipulations.py | 99.07% | 2 Missing :warning: |
heat/core/communication.py | 50.00% | 1 Missing :warning: |
heat/core/memory.py | 90.90% | 1 Missing :warning: |
heat/naive_bayes/gaussianNB.py | 75.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #749 +/- ##
==========================================
- Coverage 95.52% 95.48% -0.05%
==========================================
Files 64 64
Lines 9640 9678 +38
==========================================
+ Hits 9209 9241 +32
- Misses 431 437 +6
Flag | Coverage Δ | |
---|---|---|
gpu | 94.61% <98.03%> (-0.05%) |
:arrow_down: |
unit | 91.09% <98.03%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
rerun tests