dash icon indicating copy to clipboard operation
dash copied to clipboard

Container-compatibility of DASH types

Open devreal opened this issue 7 years ago • 7 comments

A thing I noticed (again) while looking at #504: All of the DASH reference types are container_compatible, i.e., dash::GlobIter, dash::GlobRef, and dash::GlobPtr. However, internally all but dash::GlobRef contain local pointers so they should definitely not used in distributed data structures. The culprit is the pointer to the global memory manager and in the case of GlobIter to the pattern. In the current implementation, it is really only safe to copy GlobRef between units. For the other two types, we should prohibit the copying to avoid unintended side-effects until we have a fix for this.

I'd propose that all three types only really require a dart_gptr_t as their sole argument and the information on the global memory type is stored in a central register on each unit to get rid of the local pointer in GlobPtr. A similar fix could be done for the pattern pointer in GlobIter. Both require some significant changes in the infrastructure of DASH.

This is on my list of open issues to be discussed at the F2F next week.

devreal avatar Mar 17 '18 19:03 devreal

Good point.

I'd propose that all three types only really require a dart_gptr_t as their sole argument and the information on the global memory type is stored in a central register on each unit to get rid of the local pointer in GlobPtr. A similar fix could be done for the pattern pointer in GlobIter. Both require some significant changes in the infrastructure of DASH.

The issue with this suggestion is that increment and decrement depend on the native pointer to the underlying memory space. So, if you increment a pointer you probably will not find the corresponding entry in your central register. Another thing is that you will have to communicate and synchronize these registers among communication peers if you copy a GlobPtr from one unit to the other. Or did I miss something here?

What I would suggest is that copying a GlobPtr across units is undefined behavior. However, we can allow to copy a GlobConstPtr. The semantics clearly state that increment and decrement is prohibited. In this case all we need is a dart_gptr_t and we can omit the native pointer to the underlying memory space instance. It also covers all use cases I can think of copying GlobPtr between units.

rkowalewski avatar Mar 19 '18 08:03 rkowalewski

It is exactly the same problem like with shared memory programming in Linux. You cannot simply copy a pointer to shared memory from one process to the other. What you need is a (base, offset)-tuple. That is what POSIX SHMEM or SystemV SHMEM do. In our case, the base is the pointer to underlying memory space instance. Theoretically we may name our global memory instances and provide some kind of lookup service in the form of a global HashMap (Oh no, another issue...). However, this goes too far IMHO.

rkowalewski avatar Mar 19 '18 08:03 rkowalewski

The issue with this suggestion is that increment and decrement depend on the native pointer to the underlying memory space. So, if you increment a pointer you probably will not find the corresponding entry in your central register.

That is exactly the point of the central register: you map segment IDs from the dart_gptr_t to memory spaces. I should clarify that the central register is not a single instance but that every unit stores the data for local and collective allocations it participated in by itself.

You cannot simply copy a pointer to shared memory from one process to the other. What you need is a (base, offset)-tuple.

That is what we have in a dart_gptr_t: a base (codified through the segment ID and stored in DART) and an offset. An increment should only increment the offset in the dart_gptr_t and potentially change the unit stored in the gptr if we cross process boundaries. The information on whether we cross process boundaries in an increment can be taken from the register.

What I would suggest is that copying a GlobPtr across units is undefined behavior.

A simple use-case that would be prohibited by that is the following:

Array<GlobPtr<int>>  arr(dash::size());
arr.local[0] = dash::memalloc<int>(N);
dash::barrier();
auto gptr = arr[0];
// access the second element on the first unit
gptr++;
int i = *gptr;

It should be possible for any unit to access the allocated memory by fetching the GlobPtr and dereferencing it. If the user also stores information on how many elements each unit has it is legitimate to increment that GlobPtr as long as it stays within the allocation's boundaries.

devreal avatar Mar 19 '18 09:03 devreal

This use case makes no sense IMHO. What a user typically wants is to operate on container. If you allow a use case like this you loose any container semantics. The only right implementation for your example is a dash::Array<int> with a pattern which allocates element only on a single unit.

If you do this in plain C++ then you will get in trouble as well. Let's say you have a std::list and store somewhere the pointer to std::begin(list). If you call operator++ you know what happens. A pointer is not an iterator (I hope this statement is correct). And I cannot see any valid use case where we need your example. A valid use case is that you store pointers to the element and apply some operations to the addressed elements (i.e., GlobRef). However, storing a reference in a a DASH container makes no sense either. Maybe we change GlobConstPtr or something similar for this purpose.

EDIT: A pointer is an iterator, however the other way round is not necessarily true.

rkowalewski avatar Mar 19 '18 12:03 rkowalewski

This use case makes no sense IMHO.

A bold statement to make, imo.

What a user typically wants is to operate on container.

If you only consider regular fixed grids, yes. There is a whole world out there beyond that (@tuxamito will present one possible use-case of this for his sparse matrix implementation)

The only right implementation for your example is a dash::Array<int> with a pattern which allocates element only on a single unit.

I can see how well that will scale, allocating thousands of arrays... And creating a custom pattern is not exactly an easy exercise either (certainly nothing a user does swiftly).

If you do this in plain C++ then you will get in trouble as well.

Nope, the following is perfectly valid:

std::vector<int*> v;
v.push_back(new int[N]);

// access the second element of the first allocation
auto val = *(v[0]++);

And I can store any iterator in there as well (as long as I do not modify the underlying container, of course).

Storing a GlobRef in a distributed data structure makes perfect sense but is really limited in what you can do (unless you bother to extract the gptr and use that).

Our statically sized arrays and matrices are fine but they are really inflexible. Anything more dynamic would require expensive collective allocations and copying between arrays. Using arrays and GlobPtr provides a much more flexible approach to users. And I don't see why we should constrain ourselves to statically-sized allocations and take away the freedom of dynamic allocations.

Seriously: what is the use-case for dash::memalloc if we have no way to transfer the resulting GlobPtr to another unit?

devreal avatar Mar 19 '18 13:03 devreal

I'm with @devreal in everything he wrote [1]

Copying GlobPtr between units does not correspond to copying shared pointers between processes. It corresponds to copying shared pointers between threads. This use case even motivated the introduction of threads back in the day besides more efficient context switching.

Processes have separate address spaces, threads have a shared address space. DASH units have a shared address space.

An approach that looks viable to me at first glance is to replace the native pointers by a proxy wrapper of a native pointer. It could be implicitly convertible to a native pointer but also contains information that allows remote units to redirect to the underlying memory space.

[1] Except for marginal stuff unrelated to his point. For example, STL containers do not allow references as value_type, so there is no need to support GlobRef. Also, a reference T& cannot be copied to another reference T&. A reference is a symbolic alias, you would not copy an alias, you would get a new alias, but that's another story.

fuchsto avatar May 16 '18 05:05 fuchsto

Update: as discussed, GlobRef will not be made copyable. Credits to @rkowalewski for making it possible to have GlobPtr in DASH containers. It doesn't look like we'll get GlobIter there anytime soon so moving this away from 0.4.0 but keeping it open to track progress.

devreal avatar May 02 '19 06:05 devreal