OpenCL-Docs icon indicating copy to clipboard operation
OpenCL-Docs copied to clipboard

clarify behavior of clLinkProgram when linking fails

Open bashbaug opened this issue 1 year ago • 4 comments

Creating an issue based on discussion in PR https://github.com/KhronosGroup/OpenCL-Docs/pull/798.

The behavior of clLinkProgram does not seem to be precisely described and as a result implementations are behaving differently. We need to determine what we can fix now, and if we cannot fix everything, what we would like to fix in a future spec version.

Notes:

  • clLinkProgram creates a new program object, unlike clCompileProgram and clBuildProgram, which operate on program objects that have already been created.
  • clLinkProgram may (or may not!) link asynchronously if a callback function pfn_notify is passed to the function.
  • The spec defines conditions when "the linking operation can begin": if the context, list of devices, input programs and linker options specified are all valid and appropriate host and device resources needed to perform the link are available.

Some things we need to decide where implementations are behaving differently are:

  1. What are the situations when clLinkProgram must return a NULL program object and an error code in errcode_ret? Are these all of the cases where "the linking operation cannot begin", or are there other cases that must return a NULL program object and an error code also?
  2. Are there scenarios when clLinkProgram may return both a new non-NULL program object and an error code in errcode_ret? Or, if an error code is generated, will clLinkProgram also return a NULL program object?
  3. If a callback function is provided, will it always be called, even if an error occurs? If an error occurs, what program object is passed to the callback function?

(If you're curious to see how your implementation behaves, I put my tester here: https://github.com/bashbaug/SimpleOpenCLSamples/tree/link-program-error-behavior/samples/99_linkprogramerror.)

bashbaug avatar Mar 03 '24 06:03 bashbaug

rusticl:

Running on platform: rusticl
Running on device: Mesa Intel(R) UHD Graphics (CML GT2)


Compiling program object 0x379c3d8...
In program_callback: program = 0x379c3d8, user_data = (nil)
Program build status: 0
Program build log:

End of program callback.

clCompileProgram() returned 0
Program compile log for device Mesa Intel(R) UHD Graphics (CML GT2):



Linking program...
In program_callback: program = 0x379c818, user_data = (nil)
Program build status: -2
Program build log:
(file=input,line=0,column=0,index=0): Unresolved external reference to "func".

End of program callback.

(file=input,line=0,column=0,index=0): Unresolved external reference to "func".

clLinkProgram() returned -17
clLinkProgram() created program object 0x379c818.
Program link log for device Mesa Intel(R) UHD Graphics (CML GT2):
(file=input,line=0,column=0,index=0): Unresolved external reference to "func".

All done.

but asynchronous compilation/linking hasn't been implemented yet.

karolherbst avatar Mar 06 '24 10:03 karolherbst

some of my thoughts:

  1. a cl_program object is the only reliable way to fetch program logs. So I'd say it should depend on that. Usually the CL API returns NULL on errors NULL, but cl_program is special due to this reason and I'd prefer it stays the only reason.
  2. same as 1. I guess
  3. I think the spec is clear enough on that: pfn_notify is a function pointer to a notification routine. The notification routine is a callback function that an application can register and which will be called when the program executable has been built (successfully or unsuccessfully). So given the reason stated in 1. there will always be a cl_program object, therefore by deduction an attempted build also guarantees a valid cl_program object existing as you have no way to retrieve logs otherwise. The question remains if the callback should be called besides attempted compilations/linkings, but that would be a breaking change as applications might run into crashes if they receive a NULL cl_program object now and don't handle CL_INVALID_PROGRAM being returned or other error handling they deemed not necessary.

karolherbst avatar Mar 06 '24 10:03 karolherbst

Are you saying clLinkProgram should return a non-NULL program object even if the linking operation cannot begin, for instance in case CL_​INVALID_​CONTEXT is returned from the errcode_ret parameter?

In #798 I made it so a non-NULL program object is only returned (and passed to callback) if errcode_ret is either CL_SUCCESS or CL_LINK_PROGRAM_FAILURE.

That would be enough to always fetch logs, but always returning a non-NULL program object also sounds good. I guess this way is more consistent.

SunSerega avatar Mar 06 '24 12:03 SunSerega

no, I didn't. I only meant that in the callback it won't be NULL.

karolherbst avatar Mar 07 '24 00:03 karolherbst