heat
heat copied to clipboard
Features/772 partition interface
Description
implement the __partitioned__
attribute to the DNDarray for compatibility with daal4py (https://github.com/IntelPython/DPPY-Spec/issues/3). At the moment, this is not used by heat internally. However, There are some ideas about how this could be done in the future.
Issue/s resolved: #772
Changes proposed:
- added
__partitioned__
attribute to DNDarray
Type of change
- New feature (non-breaking change which adds functionality)
Due Diligence
- [x] All split configurations tested
- [x] Multiple dtypes tested in relevant functions
- [x] Documentation updated (if needed)
- [x] Updated changelog.md under the title "Pending Additions"
Does this change modify the behaviour of other functions? If so, which?
no
Codecov Report
Merging #788 (fd7ec83) into main (b9658d4) will increase coverage by
0.00%
. The diff coverage is91.89%
.
@@ Coverage Diff @@
## main #788 +/- ##
=======================================
Coverage 91.80% 91.80%
=======================================
Files 72 72
Lines 10445 10519 +74
=======================================
+ Hits 9589 9657 +68
- Misses 856 862 +6
Flag | Coverage Δ | |
---|---|---|
unit | 91.80% <91.89%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
heat/core/factories.py | 98.03% <85.71%> (-1.97%) |
:arrow_down: |
heat/core/dndarray.py | 96.92% <100.00%> (+0.12%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
rerun tests
Cool, this looks good.
The latest spec asks for a 'get' field in the outer __partitioned__
dict. Here this should probably be either lambda x: x
or a function which first checks if it's being called on the right rank. My current thinking is that it is ok to return real data only when called on the owning rank - at least for SPMD. So returning None
for input None
would be fine, I think.
I added your code (slightly modified) to my controller/worker wrapper prototype (#823). It has similar restrictions but allows calling get
on the owner rank as well as on the controller rank (as required by this programming model).
Im not seeing the changes that you made when you moved the create partition interface function over.
as for the get
field, you mean something which gets the data of a specific partition correct? for example, it would look something like this x.__partitioned__.get(tuple or tile identifier)
and it would return that data if it is on the node already otherwise it would be None. is that correct?
Yes, except of course that its x.__partitioned__['get'](id)
.
The idea here is that - in particular for frameworks like ray and dask - 'data' might (should) not be raw data but a handle/future. Having a unified way of converting the handle into raw data without explicitly understanding/using the ray/dask/... make this more useful.
ive implemented that now. i tested it with a small example:
x = ht.arange(8* 3* 2).reshape((8, 3, 2)).resplit(0)
print(x.__partitioned__['get']((0, 0, 0)))
[1] None
[2] None
[0] tensor([[[ 0, 1],
[0] [ 2, 3],
[0] [ 4, 5]],
[0]
[0] [[ 6, 7],
[0] [ 8, 9],
[0] [10, 11]],
[0]
[0] [[12, 13],
[0] [14, 15],
[0] [16, 17]]], dtype=torch.int32)
So far this is addressing the producer side. We'd also need the consumer side.
HeAT would need a from_partitioned
creating a DNDarray from a __partitioned__
. If this also goes into other features like constructors or operators can then be considered as well.
@fschlimb I have added a bit more functionality to from_partitioned
. It now supports non-zero split axes and i have also added a from_partition_dict
function which does the same thing, but it dont not require a DNDarray as the object being passed in. Instead this one takes the dictionary object and creates the matching DNDarray. It behaves the same way and does a zero-copy when possible. I have added unit tests for this as well. Please have a look
FYI: A new discussion was initiated with the data-API consortium: https://github.com/data-apis/consortium-feedback/issues/7
👇 Click on the image for a new way to code review
-
Make big changes easier — review code in small groups of related files
-
Know where to start — see the whole change at a glance
-
Take a code tour — explore the change with an interactive tour
-
Make comments and review — all fully sync’ed with github
Legend
@fschlimb @coquelin77 thanks again for all this work. What are the next steps here?
I guess there are 2 options:
- wait for the discussion https://github.com/data-apis/consortium-feedback/issues/7 to conclude. Any input there could help bringing this to a conclusion.
- Go ahead and just merge it. It doesn't seem to hurt even it's not as useful as it could be since it's an isolated implementation.