pycuda icon indicating copy to clipboard operation
pycuda copied to clipboard

Is there a way to share context among threads, if not why?

Open menglin0320 opened this issue 2 years ago • 10 comments

Basically I want to achieve concurrent work with multithreading and my current inference code is pycuda + tensorrt. why I want to do so I'm trying to optimize the inference throughput for a model with dynamic input. the size difference between samples can be quite significant. So I want to avoid padding but still do something similar to batching, I want to run several samples concurrently with the same engine. The inference time will still be bottlenecked by the biggest sample in the batch but a lot of flops are saved, also it prevents possible performance drop from padding too much.

my current understanding of the problem From what I understood If works are in different cuda contexts there is no real parallel working, instead it is just better scheduling. Also one process can only have one cuda context but threads can share contexts. It may not be true for pycuda so I need to . But I didn't find anything talking about how to share one context among threads yet.

I found the official example here for using multithreading with pycuda link

Device.make_context() There's not much difference between multithreading and multiprocess then. If each thread owns it's own context then there is no real concurrent work.

My question: I just wonder if my understanding on context is right. And I wonder if there is a way to share context between different threads. I feel it should be possible, if it is not possible with pycuda, can anyone briefly explain why?

menglin0320 avatar Aug 04 '21 21:08 menglin0320

I just learned abou GIL. So we have to use multiprocess and I can only dive into mpi to solve my problem if I want to stick with python?

menglin0320 avatar Aug 04 '21 23:08 menglin0320

This also came up recently in https://github.com/inducer/pycuda/issues/305#issuecomment-887761332. PyCUDA currently assumes that each context can only be active in a single thread. It appears that this was true up until CUDA 4, but this restriction was then lifted. I would welcome a PR that removes this restriction. It might be as simple as deleting the check for uniqueness of activation.

inducer avatar Aug 07 '21 20:08 inducer

Yes I also saw it. I may switch to polygraphy instead. I don't know much about cuda wrappers and I chose pycuda only because official tensorrt example used it. But the test code in tensorrt used polygraphy instead. But it seems like polygraphy hide all details about contexts. Hope that it can work.

menglin0320 avatar Aug 09 '21 16:08 menglin0320

The nvidia guys told me that tensorrt inference releases GIL. That's a good news, if the new feature would be added it can be useful in this case.

menglin0320 avatar Aug 09 '21 16:08 menglin0320

How come this got closed? The question you raised is a real concern to my mind, and I wouldn't be opposed to the issue staying open.

inducer avatar Aug 21 '21 02:08 inducer

okay, just one quick question also. I found that pycuda is a lot quicker than polygraphy when doing memcpy. Do you know the reason?

menglin0320 avatar Aug 21 '21 02:08 menglin0320

PyCUDA isn't doing anything special with memcpy. It just calls the corresponding CUDA function. For an additional speedboost, you can use "page-locked" memory (on the host side).

inducer avatar Aug 21 '21 02:08 inducer

k, I'll try to read the source code myself...

menglin0320 avatar Aug 21 '21 04:08 menglin0320

@menglin0320 similar situation with you, and I have a solution How to perform different models in different gpu simultaneously

zacario-li avatar May 28 '22 04:05 zacario-li

Would fixing this also make it possible for CUDA objects to be safely garbage collected in threads where the context is not current?

bmerry avatar Sep 29 '23 13:09 bmerry