heat icon indicating copy to clipboard operation
heat copied to clipboard

Features/unique sort distributed

Open ClaudiaComito opened this issue 3 years ago • 2 comments

Description

This PR introduces major changes in the ht.unique() implementation, fixing some bugs/inconsistencies along the way (see below).

Changes proposed:

Distributed unique requires two passes:

  1. find local sorted unique elements,
  2. find global sorted unique elements.

The current (v0.5.1) implementation solves step 2. by running torch.unique again on the gathered local unique elements. This might turn into a memory bottleneck for very large data.

The main implementation change in this PR is that, in the distributed case, ht.unique now recycles the "pivot sorting" implementation (see ht.sort(), manipulations._pivot_sorting()) to perform an Alltoallv-based sorted unique operation that doesn't require "gathering".

The main user-side changes are as follows:

  • ht.unique now, like numpy, always returns the SORTED unique elements.

  • "sparse" vs. "dense" unique. If the collective size of the local uniques (from step 1) above) is smaller than the size of the local data, then ht.unique gathers everything and runs the operation locally. In this case, the unique elements array will have split=None. Otherwise, distributed unique via _pivot_sorting() (Alltoallv) returning a distributed DNDarray.

  • inverse indices are now a DNDarray and distributed like the input data. Note that inverse indices are used to recreate the original data shape from the unique elements. However, the sorted unique elements corresponding to a given inverse index might be on a different process. Eventually, setitem should be able to deal with this, at the moment unique[inverse] requires a unique.resplit_(None) first.

As an aside:

  • get gethalo to work on imbalanced DNDarrays
  • get create_lshape_map to only require communication for imbalanced DNDarrays
  • resolve race condition in test_qr that has been popping up on and off for ages.
  • ADDED 24 NOV 2021: factories.array behaviour when copy=False now closer to np.array (https://numpy.org/doc/stable/reference/generated/numpy.array.html).

Issue/s resolved: #363, #564, #621

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected):
    • ht.unique() always returns the sorted unique elements, kwargsorted has been removed
    • inverse indices are no longer torch tensors, they're now DNDarrays and distributed like the input
    • unique.resplit_(None) might be required before applying inverse indices
    • NEW: factories.array(copy=False) does not copy slices of original data unless absolutely necessary (dtype, order etc.)

Due Diligence

  • [x] All split configurations tested
  • [x] Multiple dtypes tested in relevant functions
  • [x] Documentation updated (if needed)
  • [x] Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

  • the possibility to leave the data "unsorted" is not available any longer.
  • operations expecting inverse indices to be a local torch tensor will fail.
  • operations expecting ht.unique() to return a non-distributed DNDarray may fail in some cases.

ClaudiaComito avatar Mar 24 '21 10:03 ClaudiaComito

Codecov Report

Attention: Patch coverage is 98.03150% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 95.48%. Comparing base (35f39d1) to head (159d1c5). Report is 1019 commits behind head on main.

Files Patch % Lines
heat/core/manipulations.py 99.07% 2 Missing :warning:
heat/core/communication.py 50.00% 1 Missing :warning:
heat/core/memory.py 90.90% 1 Missing :warning:
heat/naive_bayes/gaussianNB.py 75.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #749      +/-   ##
==========================================
- Coverage   95.52%   95.48%   -0.05%     
==========================================
  Files          64       64              
  Lines        9640     9678      +38     
==========================================
+ Hits         9209     9241      +32     
- Misses        431      437       +6     
Flag Coverage Δ
gpu 94.61% <98.03%> (-0.05%) :arrow_down:
unit 91.09% <98.03%> (-0.03%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 25 '21 09:03 codecov[bot]

rerun tests

coquelin77 avatar Jul 13 '21 08:07 coquelin77