Description

implement the __partitioned__ attribute to the DNDarray for compatibility with daal4py (https://github.com/IntelPython/DPPY-Spec/issues/3). At the moment, this is not used by heat internally. However, There are some ideas about how this could be done in the future.

Issue/s resolved: #772

Changes proposed:

added __partitioned__ attribute to DNDarray

Type of change

New feature (non-breaking change which adds functionality)

Due Diligence

[x] All split configurations tested
[x] Multiple dtypes tested in relevant functions
[x] Documentation updated (if needed)
[x] Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

Jun 08 '21 09:06 coquelin77

Codecov Report

Merging #788 (fd7ec83) into main (b9658d4) will increase coverage by 0.00%. The diff coverage is 91.89%.

@@           Coverage Diff           @@
##             main     #788   +/-   ##
=======================================
  Coverage   91.80%   91.80%           
=======================================
  Files          72       72           
  Lines       10445    10519   +74     
=======================================
+ Hits         9589     9657   +68     
- Misses        856      862    +6

Flag	Coverage Δ
unit	`91.80% <91.89%> (+<0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
heat/core/factories.py	`98.03% <85.71%> (-1.97%)`	:arrow_down:
heat/core/dndarray.py	`96.92% <100.00%> (+0.12%)`	:arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jun 08 '21 09:06 codecov[bot]

rerun tests

Jul 13 '21 08:07 coquelin77

Cool, this looks good.

The latest spec asks for a 'get' field in the outer __partitioned__ dict. Here this should probably be either lambda x: x or a function which first checks if it's being called on the right rank. My current thinking is that it is ok to return real data only when called on the owning rank - at least for SPMD. So returning None for input None would be fine, I think.

I added your code (slightly modified) to my controller/worker wrapper prototype (#823). It has similar restrictions but allows calling get on the owner rank as well as on the controller rank (as required by this programming model).

Jul 13 '21 09:07 fschlimb

Im not seeing the changes that you made when you moved the create partition interface function over.

as for the get field, you mean something which gets the data of a specific partition correct? for example, it would look something like this x.__partitioned__.get(tuple or tile identifier) and it would return that data if it is on the node already otherwise it would be None. is that correct?

Jul 13 '21 12:07 coquelin77

Yes, except of course that its x.__partitioned__['get'](id).

The idea here is that - in particular for frameworks like ray and dask - 'data' might (should) not be raw data but a handle/future. Having a unified way of converting the handle into raw data without explicitly understanding/using the ray/dask/... make this more useful.

Jul 13 '21 12:07 fschlimb

ive implemented that now. i tested it with a small example:

x = ht.arange(8* 3* 2).reshape((8, 3, 2)).resplit(0)
print(x.__partitioned__['get']((0, 0, 0)))

[1] None
[2] None
[0] tensor([[[ 0,  1],
[0]          [ 2,  3],
[0]          [ 4,  5]],
[0] 
[0]         [[ 6,  7],
[0]          [ 8,  9],
[0]          [10, 11]],
[0] 
[0]         [[12, 13],
[0]          [14, 15],
[0]          [16, 17]]], dtype=torch.int32)

Jul 13 '21 12:07 coquelin77

So far this is addressing the producer side. We'd also need the consumer side. HeAT would need a from_partitioned creating a DNDarray from a __partitioned__. If this also goes into other features like constructors or operators can then be considered as well.

Jul 14 '21 16:07 fschlimb

@fschlimb I have added a bit more functionality to from_partitioned. It now supports non-zero split axes and i have also added a from_partition_dict function which does the same thing, but it dont not require a DNDarray as the object being passed in. Instead this one takes the dictionary object and creates the matching DNDarray. It behaves the same way and does a zero-copy when possible. I have added unit tests for this as well. Please have a look

Sep 23 '21 07:09 coquelin77

FYI: A new discussion was initiated with the data-API consortium: https://github.com/data-apis/consortium-feedback/issues/7

Oct 07 '21 09:10 fschlimb

👇 Click on the image for a new way to code review

Make big changes easier — review code in small groups of related files
Know where to start — see the whole change at a glance
Take a code tour — explore the change with an interactive tour
Make comments and review — all fully sync’ed with github

Try it now!

Legend

CodeSee Map Legend

Jun 01 '22 08:06 ghost

@fschlimb @coquelin77 thanks again for all this work. What are the next steps here?

Jun 01 '22 08:06 ClaudiaComito

I guess there are 2 options:

wait for the discussion https://github.com/data-apis/consortium-feedback/issues/7 to conclude. Any input there could help bringing this to a conclusion.
Go ahead and just merge it. It doesn't seem to hurt even it's not as useful as it could be since it's an isolated implementation.

Jun 02 '22 10:06 fschlimb

heat
heat copied to clipboard

Features/772 partition interface

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

Codecov Report

Legend

heat heat copied to clipboard

Features/772 partition interface

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

Codecov Report

Legend

heat
heat copied to clipboard