cuda-python
cuda-python copied to clipboard
Does CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM only affect the driver API?
Version 11.6.0 added this environment variable, according to the documentation, setting this environment variable to 1 can make the default stream use per-thread stream.
However, from the code perspective, this mainly controls the use of ptds/ptsz suffix versions when loading driver API symbols. Runtime API symbols are directly linked and not affected.
So, does this environment variable only affect the driver API?
If so, this should be explained in the document.