CudaMiner icon indicating copy to clipboard operation
CudaMiner copied to clipboard

memcpy/kernel concurenncy

Open whitesand77 opened this issue 11 years ago • 2 comments

I ran the lastest (commit 132) through the NVIDIA Visual Profiler and noticed there is no concurrency at all with memcpy or kernel execution. Is this known?

I did a small test with the current code to actually use the 2 streams it's coded for and got 3-5% improvement but there was very little concurrency with the current state of code. I don't understand the code well enough to refactor it to fully be concurrent.

On other code I've seen 30-40% improvement with proper use of streams.

whitesand77 avatar Jan 28 '14 19:01 whitesand77

it"s known. issue order problem.

2014-01-28 whitesand77 [email protected]

I ran the lastest (commit 132) through the NVIDIA Visual Profiler and noticed there is no concurrency at all with memcpy or kernel execution. Is this known?

I did a small test with the current code to actually use the 2 streams it's coded for and got 3-5% improvement but there was very little concurrency with the current state of code. I don't understand the code well enough to refactor it to fully be concurrent.

On other code I've seen 30-40% improvement with proper use of streams.

Reply to this email directly or view it on GitHubhttps://github.com/cbuchner1/CudaMiner/issues/83 .

cbuchner1 avatar Jan 28 '14 19:01 cbuchner1

use -H 2 for less memcpy operations. and soon the remaining part will be eliminated by checking hashes on the GPU.

2014-01-28 Christian Buchner [email protected]

it"s known. issue order problem.

2014-01-28 whitesand77 [email protected]

I ran the lastest (commit 132) through the NVIDIA Visual Profiler and

noticed there is no concurrency at all with memcpy or kernel execution. Is this known?

I did a small test with the current code to actually use the 2 streams it's coded for and got 3-5% improvement but there was very little concurrency with the current state of code. I don't understand the code well enough to refactor it to fully be concurrent.

On other code I've seen 30-40% improvement with proper use of streams.

Reply to this email directly or view it on GitHubhttps://github.com/cbuchner1/CudaMiner/issues/83 .

cbuchner1 avatar Jan 28 '14 19:01 cbuchner1