cusignal Performance variation among different runs

We ran cuSignal on a VM five times and observed a large variation in terms of running time among different runs for a couple of tests although we didn't change anything neither from our platform nor from the code. For instance, the runtimes of ISTFT with "1024-1000000.0-65536-float64" parameter from two runs were 362.2986us and 665.0717us. We observed some similar differences for a couple of other tests (e.g. ChannelizePoly CWT,...) as well. What could be the cause of such big variations?

May 17 '23 21:05 Soroushmehr1

Hi @Soroushmehr1, thanks for using cuSignal.

A couple questions:

What GPUs were you using?
How were you timing your functions? Were you simply using the pytest benchmarks or doing your own performance measuring?

May 18 '23 14:05 awthomp

Hi Adam, Thank you for your reply. I am using NC H100 v4 with two GPUs and 640 GiB RAM. I am using the pytest benchmark for measuring the time. Please let me know if there are any questions. Best, Reza

From: Adam Thompson @.> Sent: Thursday, May 18, 2023 10:55 AM To: rapidsai/cusignal @.> Cc: Reza Soroushmehr @.>; Mention @.> Subject: Re: [rapidsai/cusignal] Performance variation among different runs (Issue #573)

Hi @Soroushmehr1https://github.com/Soroushmehr1, thanks for using cuSignal.

A couple questions:

What GPUs were you using?
How were you timing your functions? Were you simply using the pytest benchmarks or doing your own performance measuring?

Reply to this email directly, view it on GitHubhttps://github.com/rapidsai/cusignal/issues/573#issuecomment-1553187169, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI47AHTR4Q7OJ6GLD7A7TL3XGYZ53ANCNFSM6AAAAAAYFVHXJI. You are receiving this because you were mentioned.Message ID: @.***>

May 18 '23 14:05 Soroushmehr1

Hi Reza,

I haven't tested cuSignal on these H100 Azure instances, so I don't immediately know what's going on with the time deltas. One thing you could do is to take a look at a specific function and time it like so:

Run cuSignal function
Start timer
Run cuSignal function in a for loop with N cycles
Stop timer
Examine time delta / N to get time per iteration

I believe our pytest benchmarks include first run (with CUDA warmup, memory caching, etc).

May 18 '23 15:05 awthomp

Hi Adam, Thank you for your reply and suggestion. Is there any randomness in the inputs or number of inputs fed to a function? Among five runs, we observed the gap in mostly two of them. I attached the spreadsheet and highlighted the ones with large variations. What could be the reason for these gaps? Best, Reza

From: Adam Thompson @.> Sent: Thursday, May 18, 2023 11:07 AM To: rapidsai/cusignal @.> Cc: Reza Soroushmehr @.>; Mention @.> Subject: Re: [rapidsai/cusignal] Performance variation among different runs (Issue #573)

Hi Reza,

I haven't tested cuSignal on these H100 Azure instances, so I don't immediately know what's going on with the time deltas. One thing you could do is to take a look at a specific function and time it like so:

Run cuSignal function
Start timer
Run cuSignal function in a for loop with N cycles
Stop timer
Examine time delta / N to get time per iteration

I believe our pytest benchmarks include first run (with CUDA warmup, memory caching, etc).

Reply to this email directly, view it on GitHubhttps://github.com/rapidsai/cusignal/issues/573#issuecomment-1553205409, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI47AHVIFI2XAHWYTXF5OHTXGY3KTANCNFSM6AAAAAAYFVHXJI. You are receiving this because you were mentioned.Message ID: @.***>

May 18 '23 17:05 Soroushmehr1

cusignal cusignal copied to clipboard

Performance variation among different runs

cusignal
cusignal copied to clipboard