heat
heat copied to clipboard
Address distributed non-ordered indexing
Related All kind of stuff depends on distributed non-ordered indexing. Here a sample of issues/PRs where the problem has come up in various forms:
#607 #760 #903 #703 #824 #902 #177 #621 #749 #271 #857
Feature functionality
We want to be able to index a distributed DNDarray with a distributed, non-ordered key, and return the correct, stable result. Examples below. An implementation of this functionality via Alltoallv is available in ht.sort()
, needs to be generalized.
>>> a = ht.arange(50, split=0)
>>> a
DNDarray([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], dtype=ht.int32, device=cpu:0, split=0)
>>> b = ht.random.randint(0,50,(20,), dtype=ht.int64, split=0)
>>> b
DNDarray([46, 5, 44, 8, 14, 10, 30, 15, 34, 30, 41, 44, 28, 26, 11, 20, 16, 7, 9, 8], dtype=ht.int64, device=cpu:0, split=0)
>>> c = a[b]
>>> c
DNDarray([46, 5, 44, 8, 14, 10, 30, 15, 34, 30, 41, 44, 28, 26, 11, 20, 16, 7, 9, 8], dtype=ht.int32, device=cpu:0, split=0)
In the current implementation, c = a[b]
returns a distributed DNDarray populated by whichever subset of the key is process-local.
On 2 processes:
c = DNDarray([ 5, 8, 14, 10, 15, 11, 20, 16, 7, 9, 8, 46, 44, 30, 34, 30, 41, 44, 28, 26], dtype=ht.int32, device=cpu:0, split=0)
On 3 processes:
c = DNDarray([ 5, 8, 14, 10, 15, 11, 16, 7, 9, 8, 30, 30, 28, 26, 20, 46, 44, 34, 41, 44], dtype=ht.int32, device=cpu:0, split=0)
Stil relevant: #938 is a currently active PR that will resolve this issue.
Reviewed within #1109.
Branch bugs/914-Address_distributed_non-ordered_indexing created!