OpenCL-Docs icon indicating copy to clipboard operation
OpenCL-Docs copied to clipboard

clEnqueueSVMUnmap

Open bananAshkar opened this issue 1 year ago • 14 comments

about the function clEnqueueSVMUnmap , what happens if we call this function without calling clEnqueueSVMMap first? this is not well-defined in openCL 3.0 Reference Page (spec)

bananAshkar avatar Mar 09 '23 07:03 bananAshkar

Interesting question.

We currently say for the description of the svm_ptr argument to clEnqueueSVMUnmap:

svm_ptr is a pointer that was specified in a previous call to clEnqueueSVMMap. If svm_ptr is allocated using clSVMAlloc then it must be allocated from the same context from which command_queue was created. Otherwise the behavior is undefined.

IMHO this is ambiguous whether "otherwise the behavior is undefined" applies to just the sentence about being allocated from the same context as the command queue or whether it applies to svm_ptr being a pointer previously used with clEnqueueSVMMap also.

The only relevant error condition is:

CL_INVALID_VALUE if svm_ptr is NULL.

Do you have a preference how you would like this to behave?

It would also be interesting to know how different implementations behave in this case.

bashbaug avatar Mar 09 '23 21:03 bashbaug

In our implementation, we don't check if the buffer has been mapped or not, and we perform the unmapping operation anyway, which may be redundant, but harmless. I believe that returning an error would be better in this case.

aharon-abramson avatar May 18 '23 10:05 aharon-abramson

We (Arm) return CL_INVALID_VALUE in that case. We'd support this becoming the specified behaviour.

kpet avatar Aug 01 '23 16:08 kpet

I tried a couple of Intel OpenCL implementations and it looks like our CPU implementation returns CL_INVALID_VALUE for this case and our GPU implementation returns CL_SUCCESS. I saw similar mixed behavior in a couple of the non-Intel OpenCL implementations I tried, too.

I don't think it would be too hard for our GPU implementation to return an error if this is what we decide to do. Theoretically this could cause an app to fail, since we would be returning an error (and not generating an event) when we weren't previously, but this probably won't be a problem, especially given that other OpenCL implementations are generating an error in this case.

In case it's helpful: https://github.com/bashbaug/SimpleOpenCLSamples/blob/svm-unmap-test/samples/99_svmunmap/main.cpp

bashbaug avatar Aug 02 '23 05:08 bashbaug

Our (Qualcomm) implementation returns CL_INVALID_VALUE for this situation.

lakshmih avatar Aug 21 '23 21:08 lakshmih

We don't return an error in Mesa for this, but it would be consistent with clEnqueueUnmapMemObject to return one.

karolherbst avatar Aug 22 '23 16:08 karolherbst

Discussed in the August 22nd teleconference:

  • Implementations should return an error in this case - CL_INVALID_VALUE.
  • We don't currently have a CTS test for this case, but we'll look into adding one (pending a negative testing discussion).

bashbaug avatar Aug 29 '23 15:08 bashbaug

I have a (draft) PR written up that adds this error condition in #979, but I'm starting to have second thoughts whether it is actually possible to reliably return an error in this case. Consider a tricky case like the following (pseudocode):

event1 = clCreateUserEvent();
clEnqueueSVMUnmap(queue1, ptr, depends on event1);

event0 = clCreateUserEvent();
clEnqueueSVMMap(queue0, ptr, depends on event0);

Observations are:

  1. We can't determine at the call to clEnqueueSVMUnmap whether the ptr is mapped or not - if event0 is set to complete first then the map will execute first, and if event1 is set to complete first then the unmap will execute first.
  2. There are other cases without using user events where an SVM map may execute before an SVM unmap but where there are no guarantees.
  3. Buffer and image memory objects don't have a similar problem for two reasons: the pointer to unmap is (usually) unknown until the memory object is mapped, therefore as a practical matter the call to map the memory object must (usually) occur before the call to unmap the memory object. It is also possible to query the memory object's map count, which provides another mechanism to determine if the memory object has been mapped or not.

Given (3), is the expectation that implementations track which SVM regions have been mapped similarly, even though there is (currently) no queryable map count? Should the example above generate an error because clEnqueueSVMUnmap was called before clEnqueueSVMMap, even though the map may actually execute before the unmap?

bashbaug avatar Oct 17 '23 00:10 bashbaug

Given your interesting examples, I believe we can live without this error case. However, maybe we need a note in the spec to explain why we don't handle this case.

aharon-abramson avatar Oct 17 '23 07:10 aharon-abramson

We observed in the teleconference on October 17th that there could be a similar issue with buffer and image memory objects, also. Specifically, when is the map count updated, and is it used to identify the (synchronous) runtime error?

Consider this example:

ptr = clEnqueueMapBuffer(queue0, buffer, blocking_map = CL_TRUE);
// map was blocking, so now MAP_COUNT is one

event0 = clCreateUserEvent();
clEnqueueUnmapMemObject(queue0, ptr, depends on event0);
// what is the MAP_COUNT now?

event1 = clCreateUserEvent();
clEnqueueUnmapMemObject(queue1, ptr, depends on even1); // is this an error?
// what is the MAP_COUNT now?

I haven't written code to try this case specifically, but I do have code in the OpenCL Intercept Layer to display the queryed map count after mapping and unmapping memory objects and empirically it seems like most implementations upate the map count as soon as the host API executes.

bashbaug avatar Oct 20 '23 14:10 bashbaug

I just noticed that I never implemented CL_MEM_MAP_COUNT in rusticl... Maybe we should have some CTS for it? Though given that the spec isn't really giving a lot of details on CL_MEM_MAP_COUNT that might be a bit challenging to do.

karolherbst avatar Oct 20 '23 15:10 karolherbst

I made a quick tester to try this out: https://github.com/bashbaug/SimpleOpenCLSamples/compare/map-unmap-tester

Results were pretty similar across all of the implementations I've tried, though there were a few minor differences. I agree it would be good to have some CTS tests to exercise this, at least for the common cases.

In summary:

  • The implementations I tested seemed to increase the map count immediately after calling clEnqueueMapBuffer, regardless whether the call was blocking or non-blocking. Note though, I didn't test holding a call back with a user event dependency (yet).
  • Most of the implementations I tested also seemed to decrease the map count immediately after calling clEnqueueUnmapMemObject, even when the call was held back with a user event dependency, though this isn't universal.
  • Regardless of the map count, all implementations I tested returned an error when the number of unmap calls exceeds the number of map calls, even if some of the unmap calls are held back with a user event dependency.

Example output:

Running on platform: Intel(R) OpenCL Graphics
Running on device: Intel(R) Arc(TM) A750 Graphics
Initially: map count is 0
After blocking map: map count is 1
After non-blocking map: map count is 2
After another non-blocking map: map count is 3
After finish: map count is 3
After unmap: map count is 2
After finish: map count is 2
After unmap with event dependency: map count is 1
After unmap with event dependency: map count is 0
Error: commandQueue.enqueueUnmapMemObject(buf, ptr2, &deps2) returned -30
After duplicate unmap with event dependency: map count is 0
After setting event status to complete: map count is 0
After finish: map count is 0

bashbaug avatar Nov 28 '23 16:11 bashbaug

Consider separating into multiple issues:

  • Do we want to return an error from clEnqueueSVMUnmap for an arbitrary pointer?
  • Do we want to return an error from clEnqueueSVMUnamp if it is a valid SVM pointer but it has not been mapped?
  • When should (must?) the map count on a memory object update?
  • When is it an error to unmap a memory object multiple times?

All of these scenarios need CTS tests.

bashbaug avatar Mar 05 '24 17:03 bashbaug

  • We should return an error when calling clEnqueueSVMUnmap for an arbitrary pointer.
  • Delay the check if the pointer has been mapped to the unmap command's execution, i.e. the API should return CL_SUCCESS, but the event should have a negative status.
  • The count should be updated when the map/unmap commands have been completed.
  • It is an error to unmap a memory object with a map counter that equals 0.

aharon-abramson avatar Jun 23 '24 07:06 aharon-abramson