pyopencl
pyopencl copied to clipboard
`Array.events` spuriously synchronizes on read-after-read
- [ ] There should be
Array.write_events
andArray.read_events
, withArray.events
aliasingwrite_events
under a deprecation. - [ ] Arguments to all the generated kernels (elementwise etc.) should declare whether they're writing and automatically add to the appropriate
events
list. - [ ] Audit code calling into the kernels to avoid duplicate event insertion.
cc @alexfikl @zachjweiner
If I'm recalling correctly, I think the idea was that once we can distinguish between read- and write-synchronization points, there will be no need to enforce the order of two kernels based on an array that they both (only) read. For example, the operations y = 2 * x
and y_h = 2 * x.get()
do not need to be ordered, but the first would add an event to x.events
which the call to get
would wait for. In principle they could execute asynchronously, but Array
's event tracking enforces this synchronization at the moment (because it has no way to know that neither execution modifies x
).
Does this make sense?
Thanks for explaining, I remember now. The current Array.events
system needs to be redone to wait on reads before making in-place changes.
!91 adds synchronization for WAR in .get()
, but the overall issue is more general, IMO.
This is really what gl-19 was meant to capture. (which we'll have this issue subsume)
@zachjweiner Could you remind me which way this issue went? According to my experiments, I believe 2*x
does not currently add the resulting event to x.events
(as of a60b4ed), so there's is not currently spurious synchronization. But I could see an argument that there should be synchronization for 2*x
, to ensure that nobody modifies x
before 2*x
is done reading. Was this the core of the issue?
I believe
2*x
does not currently add the resulting event tox.events
I find the same - it seems events are only appended to self
in array.py and never other
.
But I could see an argument that there should be synchronization for
2*x
, to ensure that nobody modifiesx
before2*x
is done reading. Was this the core of the issue?
Yes, I believe so. And I definitely believe the argument.
Looking back at gl-18 I see we discussed this issue with regard to calls to Array.get
. And it does seem that the MR did add the get event to Array.events
.
PS - I think #302 can be (read: was intended to be) closed.
@inducer I was looking a bit through the code to see how to best solve the second point here, by having the kernels be more explicit about inputs and outputs.
First, it seems like these are the kernels that need work (missed any?):
-
pyopencl.elementwise
:ElementwiseKernel
-
pyopencl.reduction
:ReductionKernel
-
pyopencl.scan
:GenericScanKernel
,GenericDebugScanKernel
,LegacyScanKernelBase
(?)
They all take an arguments
parameter in __init__
and __call__
takes the arrays as positional arguments in *args
. What would be a good way to mark these as input / output in a backwards compatible way?
- Add an
inputs
(defaulting toarguments
) andoutputs
(defaulting toarguments
) to__init__
? Not sure this can be done in a backwards compatible way though, since everything is a positional argument. - Add an
outputs: List[int]
to__init__
to mark which of the arguments inarguments
are outputs? - We could mark
pyopencl.tools.Argument
with a flag and have any new code pass those in__init__(arguments)
? Something like howArrayArg
seems to work inloopy
. Not sure what to do about the case wherearguments
is a string though (deprecated?). - Some keyword arg in
__call__
denoting which arrays in*args
are inputs / outputs? Probably shouldn't mess around with call sites though, so I don't like this.. - Introduce some new kernel classes with a better interface? Kind of annoying since the good names are already taken..
- [YOUR IDEA HERE :heart:]
EDIT: Should have probably said read / write instead of input / output above, since that's clearer.
- We could mark
pyopencl.tools.Argument
with a flag and have any new code pass those in__init__(arguments)
? Something like howArrayArg
seems to work inloopy
. Not sure what to do about the case wherearguments
is a string though (deprecated?).
I think this is the sanest. For the string arguments, this could come from either a const
or a __read_only__
(sp? It's a CL thing.)