Summary of ABQ comments on `span`
I'm going through the comments on span from Albuquerque to both understand where span stands relative to incorporation into the IS (and as it relates to P0546, P0856, and P0860) and to make sure we haven't made the same mistakes in mdspan. I'll enumerate summaries of the issues here (some with bulleted responses about how we've treated that in mdspan)
Comments from a thread titled "Questions raised while trying to implement " on the lib-ext reflector:
- The comparison operators should work with differing extents, particularly if the extent of one of them is dynamic
-
mdspandoesn't define comparison operators, not evenoperator==(could this be a problem/inconsistency for some people? Probably it's okay to just tell people to convert to span first, especially with the potential for deduction guides on the conversion operators)
-
- The comparison operators should be able to compare
spans of unequal extents, sincestd::equalandstd::lexigraphical_comparedo this -
span<int>andspan<long>should be comparable - The comparison operators take their parameters by
constreference rather than by value, even though the paper itself recommends passingspanby value- We should change the free function
subspanto take anmdspanby value for the same reason
- We should change the free function
- Contiguous iterators and contiguous containers:
spancurrently has (or had?) a constructor that takes aContiguousContainer(i.e., aContainerwhose iterators meet the requirements of contiguous iterators [27.2.1]). This isn't implementable via SFINAE (or any other compile-time technique) since contiguous iterator is a runtime property.-
mdspandoesn't have this constructor. It doesn't look like we're using the term "contiguous" in any contexts that would imply the need for SFINAE, but we should double-check this
-
- If the size of the span (in bytes) is larger than
numeric_limits<ptrdiff_t>::max()(e.g., as could be the case ifsizeof(value_type)is much larger than 1), thenas_bytes()andas_writable_bytes()could introduce undefined behavior.- This applies to us also. We should consider using
size_there instead. (Also, seems like a good motivation for theT[][][]syntax, so that we don't have to use-1for a magic extent value) - A later note says that LEWG has already discussed this and decided to stick with
ptrdiff_t, so maybe we shouldn't change anything.
- This applies to us also. We should consider using
- No reason for the
nullptr_tconstructor, since it's the same as the default constructor- Not sure I agree with this one. Implicit convertibility from a
nullptr_tis a reasonable trait to have. Either way, we omitted discussion/description of this constructor in the paper, even though it's in the wording.
- Not sure I agree with this one. Implicit convertibility from a
Other Pre-meeting comments
-
span<int, 42>.subspan<0>()shouldn't return aspan<dynamic_extent>- This is another artifact of the
dynamic_extentmagic number. It doesn't look like we have that problem because oursubspanextents are always given as normal function arguments rather than template arguments (though this seems like a bit of a shortcoming since it means we can't have a subspan with static extents other than the entirety of the parent's extent for a given rank)
- This is another artifact of the
- all iterator functions shoudl be
constexpr-
mdspanis not iterable (?!?) so this doesn't apply.
-
- Shouldn't have both
lengthandsize
Small Group discussion comments
- Lots of discussion about constructors taking
shared_ptr/unique_ptr(presumably the array partial specializations) - Something indecipherable about propagating
noexcept(presumably relating to comparison operators?) - The static version should work with
std::get, since it is likestd::array(though then it would have to participate in structured binding?)- Probably not that applicable to
mdspan. If you really need this, just convert to aspan
- Probably not that applicable to
Full group discussion
- Maybe considering switching back to
array_view, now that the termviewdoes not imply immutable (presumably with some change tostring_view?) - Poll to remove
unique_ptrandshared_ptrconstructors: 6-8-0-0-0 - Poll to remove
nullptr_tconstructor: 4-5-3-0-0 - Poll to remove
span(pointer, pointer)constructor: 0-0-2-10-0
Other notes (not from span discussion, but encountered while working on it):
- On the conversion constructors and assignment operators:
template< typename UType , typename ... UProperties >
constexpr mdspan( mdspan< UType , UProperties ... > const & ) noexcept;
template< typename UType , typename ... UProperties >
mdspan & operator = ( mdspan< UType , UProperties ... > const & ) noexcept;
One of the requirements is that "V::static_extent(r) == U::static_extent(r) or V::static_extent(r) == std::dynamic_extent for 0 < r < V::rank()". This should probably also include the possibility that U::static_extent(r) == std::dynamic_extent.
-
mdspandefines the constraints here in terms ofstd::is_assignable<U::pointer, V::pointer>, butspandefines it in terms of accessing aU::value_typethrough aV::pointermeeting the rules for well-defined object access defined in [basic.lval]/8. Are these definitions exactly equivalent?
Next question: where does this leave us on P0546
There are some issues that still need to be addressed here, but it seems reasonable to leave this as a separate paper. There are a lot of rules you might want to discuss for AccessProperties in general. For instance, can you unconditionally convert between span<int[], Property1> and span<int[], Property2>? What about span<int[]> and span<int[], Property> and back? I don't think we want to say "only one property", but this implies that the presence or absence of all (exponentially many) combinations of properties needs to be addressed. Moreover, every time someone introduces a new property, there is a precedent that implies that you'll have to implement (or prohibit) exponentially many types (though templates help with this some). We're dealing with this right now in the Executors paper, and it's not an easy problem. I think this discussion alone justifies a "generic" AccessProperties paper that sets out some ground rules.
Some of the major insights from looking at the analogous discussion with executors:
- We should choose between either "all properties allowed with all properties" or "no properties allowed together". I've already said that I think the second one is a no-go. For instance, there's no reason someone shouldn't be able to make a
span<double, atomic_access, bound_checking>. The lesson from executors so far has been that it is a huge mess to allow some combinations and prohibit others. Properties should be orthogonal. If they aren't naturally so, your interface should essentially require them to "fake it" (more on this below) - Wherever possible, attributes of
spans with a combination of properties should be expressible in terms of attributes of thespans with each of the properties individually. For instance, thenoexceptclause ofspan<T, Props...>should be expressible asnoexcept(noexcept(span<T,Props>{}[index_type{}]) && ...)(if fold expressions worked on template parameters like this, which — side note — they should but don't). Places where this sort of expression was possible in the executors proposal have been way easier to deal with than places where it wasn't. I might even go so far as to say that parts of the general class template that can't be expressed this way shouldn't be allowed to be dependent on theAccessProperties.... We should at least explore this option and weigh the costs of potentially overconstraining some properties with the benefits of simplicity.- An example where this could get sticky that immediately comes to mind is the combination of
atomic_accessand something likebounds_checking. The single specialization versions of both would want to overrideoperator[], so which one do you call? Does this require every property to know about every other property, and for you to implement all of the combinatorial possibilities? I suggest that we consider a different route on this. We should explore the implications of the following design principle: for each method or free function overload that a property cares about, it should be implementable as some combination of a pre-method hook and a post-method hook. For instance,span<int[42], atomic_access, bounds_checking>::operator[]could be implemented asaccess_property_traits<atomic_access>::pre_subscript_hook(...), thenaccess_property_traits<bounds_checking>::pre_subscript_hook(...), thenspan<int[42]>::operator[], thenaccess_property_traits<atomic_access>::post_subscript_hook(...), thenaccess_property_traits<bounds_checking>::post_subscript_hook(...), some of which could be no-ops. (This could also be implemented using the ADL-like traits idiom that seems to be favored now over the monolithic traits-class approach). Importantly, if this sort of thing gets people most of what they need, I don't think we should stall the proposal waiting for all of what we need.
- An example where this could get sticky that immediately comes to mind is the combination of
- For similar reasons, interconversion between spans of different properties is sticky. I suggest at least considering an "all-to-all" approach, in a
spans with any set of properties is always convertible to anotherspanwith different properties, as long as their property-less versions are interconvertible. Again, this may be too constraining, but if it gets everyone most of what they need, we should probably run with it. - Another concern is property ordering. For instance, it might be surprising that binding an instance of
span<int[73],UProp,VProp,WProp>to aconstreference tospan<int[73],VProp,WProp,UProp>causes the invocation of the copy constructor. We mostly can get away with this because we're working with a value type, but it's something to keep in mind.
More on this later, but the amount of text here seems to me to justify the existence of a "generic" properties paper to explore the customization point design itself.
We do still need P0856 and P0860 to be presented along with this, though, keeping in mind the following feedback from the end of the notes at the ABQ meeting:
we have precedent for poor design of customization points when they are not used (allocators).
Oh, one more "executors lesson" I forgot: it pays to be very careful and deliberate early on about the difference between properties and their values. For instance, you shouldn't have a property called gpu_memory and one called cpu_memory because these are mutually exclusive (well, UVM aside...). Instead, there should be one property called something like memory_space with different values. Then you don't have to worry about all of this orthogonality stuff with respect to gpu_memory and cpu_memory; you just specify that a property can't have multiple values. (Note that "values" can still be types; they just have a specific internal mutual exclusion that properties don't have with other properties). This is probably obvious to those of us who have worked on Kokkos, but it still seems to be a point of confusion from time to time (even for me) on the executors proposal.
mdspan is not iterable (?!?)
I'm OK with that. Iteration is a 1-D thing. Multidimensional iteration usually wants to know the indices.
Lots of discussion about constructors taking shared_ptr/unique_ptr (presumably the array partial specializations)
I've been prototyping stuff with {shared, weak, unique}_ptr lately. In particular, I wanted to model a memory pool: single owner, with multiple viewers. It's not really right to use unique_ptr for the owner, because of the non-unique viewers. I ended up using shared_ptr for the owner and weak_ptr for the viewers, so that viewers would have an option to test whether the owner still exists.
This is relevant because one important use case for an mdspan would be viewing or slicing some independently managed memory allocation. If we want constructors that take shared_ptr or unique_ptr, we beg the question about the ownership relationship between the input *_ptr and the mdspan. (Contrast that with the weak_ptr constructor that takes a shared_ptr. There, the relationship is clear.) If the mdspan just takes a raw pointer, we don't have to worry about defining that relationship.
for each method or free function overload that a property cares about, it should be implementable as some combination of a pre-method hook and a post-method hook.
Hm, how do we define the order in which the hooks get applied? What if they don't commute? I guess they are supposed to be orthogonal; does that really just mean that properties always need to behave as if these hooks commute?
(Also, wouldn't you want to apply the post hooks in reverse order of the pre hooks?)
for each method or free function overload that a property cares about, it should be implementable as some combination of a pre-method hook and a post-method hook.
Wouldn't that mean that atomic_access has to let non-atomic operator[] do its thing? I don't see how this would work.
Oh wait, you could use C++17 constexpr if for this, right? Just protect the inner non-atomic operator[] with the constexpr if thing based on some combination of the properties.
That constexpr if thing would work like this:
// class ... Props
if constexpr (... && invoke_default_implementation<Props>::value) {
span<int[42]>::operator[] (whatever_arguments_go_here);
}
Hm, how do we define the order in which the hooks get applied? What if they don't commute? I guess they are supposed to be orthogonal; does that really just mean that properties always need to behave as if these hooks commute?
I think the easy answer is that you would specify that the hooks are invoked in the order that the properties are given, and if they don't commute then that's the property implementer's fault (but it at least means that the user can figure out what's going on). It would mean that the properties are order dependent, but if the property implementers are being orthogonal enough this order should almost never matter (a lot of these operations should be compiler-reorderable anyway). Another option is to say that the invocation order is implementation-defined, meaning that any code that relies on an invocation order for these hooks would be disallowed (or something like that).
Wouldn't that mean that atomic_access has to let non-atomic operator[] do its thing? I don't see how this would work
At least as I understand it, the atomic_access implementation we have uses a hash table of sharded locks that is dependent on the address of the mdspan itself (and maybe on the offset?). It's not actually using std::atomic objects. This sharded lock version could easily be implemented in terms of hooks. The constexpr if version isn't ideal because it lacks the extensibility of the hook version.
The implementation of restrict_access in terms of hooks is a lot trickier, though. One option is to expose the type of the underlying pointer via a customization point. For instance (using the old, monolithic attributes class paradigm):
template <>
struct mdspan_property_attributes<restrict_access> {
template <typename MDSpanType>
struct pointer_type {
using type = typename MDSpanType::pointer_type __restrict;
};
// ...
};
It's a bit difficult to see how this composes with other properties, but you could specify that it works as a chain just like the other hooks, for instance.
Example of the hook I suggested above for restrict_access (seems to work with gcc, but nothing else)
@dhollman wrote:
I think the easy answer is that you would specify that the hooks are invoked in the order that the properties are given, and if they don't commute then that's the property implementer's fault (but it at least means that the user can figure out what's going on).
I like this better.
(seems to work with gcc, but nothing else)
Huh, works for recent enough Clang and Intel too, for me at least
Huh, works for recent enough Clang and Intel too, for me at least
By "works", I don't mean that it just compiles. If you look at the instruction output, gcc "correctly" ends the two restrict overloads with movl $11, %eax (that is, return 11 as a constant), while clang and intel both do the actual addition in every case except for with the raw pointers (which they both get "correct" as well). I say "correctly" in quotes because they're allowed to completely ignore restrict, but the "performance correct" thing to do here is to return a constant.
@dhollman Ah, I see, thanks!
At least as I understand it, the atomic_access implementation we have uses a hash table of sharded locks that is dependent on the address of the mdspan itself (and maybe on the offset?).
Looking at the Kokkos code, I think I might be wrong about this — I must have been thinking of the atomics for arbitrary types thing from elsewhere. @crtrott and @hcedwar can correct me on this, but it looks like Kokkos implements the Kokkos::Atomic memory trait for, e.g., operator() and operator[] by modifying the return values of these methods to return something that essentially acts like (an extended version of) std::atomic. I'm still trying to figure out how this fits in to the property ontology. I have serious concerns about the usefulness of these properties in generic code if the the return type of operator() can be changed to something that doesn't support all of the same operations as the return type in the absence of that property, which would be the case if std::atomic were used. On the other hand, if we added something that wraps std::atomic and adds operator overloads, etc., I highly doubt LEWG and EWG would be okay with this kind of thing (i.e., if they thought we should have those operators on std::atomic, they would have put them there). @dsunder how are you addressing this in P0860?
span<T[],atomic_access>::operator[] returns atomic_ref<T>
That is all that is required.
The intent is that construction / setup of the span more efficient than construction of each individual atomic_ref.