memcpy/kernel concurenncy
I ran the lastest (commit 132) through the NVIDIA Visual Profiler and noticed there is no concurrency at all with memcpy or kernel execution. Is this known?
I did a small test with the current code to actually use the 2 streams it's coded for and got 3-5% improvement but there was very little concurrency with the current state of code. I don't understand the code well enough to refactor it to fully be concurrent.
On other code I've seen 30-40% improvement with proper use of streams.
it"s known. issue order problem.
2014-01-28 whitesand77 [email protected]
I ran the lastest (commit 132) through the NVIDIA Visual Profiler and noticed there is no concurrency at all with memcpy or kernel execution. Is this known?
I did a small test with the current code to actually use the 2 streams it's coded for and got 3-5% improvement but there was very little concurrency with the current state of code. I don't understand the code well enough to refactor it to fully be concurrent.
On other code I've seen 30-40% improvement with proper use of streams.
Reply to this email directly or view it on GitHubhttps://github.com/cbuchner1/CudaMiner/issues/83 .
use -H 2 for less memcpy operations. and soon the remaining part will be eliminated by checking hashes on the GPU.
2014-01-28 Christian Buchner [email protected]
it"s known. issue order problem.
2014-01-28 whitesand77 [email protected]
I ran the lastest (commit 132) through the NVIDIA Visual Profiler and
noticed there is no concurrency at all with memcpy or kernel execution. Is this known?
I did a small test with the current code to actually use the 2 streams it's coded for and got 3-5% improvement but there was very little concurrency with the current state of code. I don't understand the code well enough to refactor it to fully be concurrent.
On other code I've seen 30-40% improvement with proper use of streams.
Reply to this email directly or view it on GitHubhttps://github.com/cbuchner1/CudaMiner/issues/83 .