spconv icon indicating copy to clipboard operation
spconv copied to clipboard

Questions about performance

Open Jaywxy opened this issue 10 months ago • 9 comments

Amazing! I have been using spconv1 before, but now I have switched to spconv2.1. It is amazing. It used to take 3 hours to train one epoch, but now it only takes 1.5 hours. And the GPU memory usage has been reduced by about 40%. But I still have some unclear questions. I wonder if you can help me answer them or give me some suggestions? This is the data and data type passed into the model. How can I modify it to make the training more efficient? image image

Jaywxy avatar Mar 27 '24 01:03 Jaywxy

Hi, can you share which model did you train and which profiling method did you use the to test the training time? Thanks

Vanessa-F avatar May 28 '24 22:05 Vanessa-F

The model I use belongs to my senior brother, and I can’t share it with you yet. You can get the training time by checking the training process. Isn’t it possible to record the time of each epoch?

Jaywxy avatar May 30 '24 08:05 Jaywxy

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Vanessa-F avatar May 30 '24 17:05 Vanessa-F

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

lebron-2016 avatar Jun 02 '24 02:06 lebron-2016

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d.

But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Vanessa-F avatar Jun 02 '24 04:06 Vanessa-F

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem. 94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d.

But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!

Regarding the first point, have you tried doing this and does it work?

lebron-2016 avatar Jun 02 '24 05:06 lebron-2016

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem. 94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d. But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!

Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

Vanessa-F avatar Jun 02 '24 05:06 Vanessa-F

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem. 94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d. But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply! Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

In fact, I printed the running time of some modules during my model inference and found that they were not much more efficient than normal convolution. I still don't understand what the problem is.

lebron-2016 avatar Jun 02 '24 05:06 lebron-2016

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem. 94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d. But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply! Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

Hi, I used torch.cuda.Event to test the time and found no problem. Do you think this is the right thing to do? And did you do this before? Why is it not feasible to use the time library?

device='cuda:0'

x_d = torch.zeros((2, 4, 1024, 1024))
x_d[0,0,0:16,0:16] += 1.
x_d = x_d.to(device)
x = SparseConvTensor.from_dense(x_d.permute(0,2,3,1))

conv_sparse = spconv.SparseConv2d(4, 4, kernel_size=3,stride=2, padding=1,bias=False, dilation=1).to(device)
bn_sparse = nn.BatchNorm1d(4, momentum=0.1).to(device)
conv_bn_relu_sparse = spconv.SparseSequential(conv_sparse, bn_sparse, nn.ReLU(inplace=True)).to(device)

conv_norm = nn.Conv2d(4, 4, kernel_size=3,stride=2, padding=1,bias=False, dilation=1).to(device)
bn_norm = nn.BatchNorm2d(4, momentum=0.1).to(device)
conv_bn_relu_norm = nn.Sequential(conv_norm, bn_norm, nn.ReLU(inplace=True)).to(device)

for i in range(10):
     print("round:", i)
     start_event = torch.cuda.Event(enable_timing=True)
     end_event = torch.cuda.Event(enable_timing=True)
     start_event.record()
     encoder_output1 = conv_bn_relu_norm(x_d)
     end_event.record()
     end_event.synchronize()
     elapsed_time_ms = start_event.elapsed_time(end_event)
     print(f"conv_bn_relu_norm time: {elapsed_time_ms} milliseconds")

     start_event = torch.cuda.Event(enable_timing=True)
     end_event = torch.cuda.Event(enable_timing=True)
     start_event.record()
     encoder_output = conv_bn_relu_sparse(x)
     end_event.record()
     end_event.synchronize()
     elapsed_time_ms = start_event.elapsed_time(end_event)
     print(f"conv_bn_relu_sparse time: {elapsed_time_ms} milliseconds")

image

lebron-2016 avatar Jun 03 '24 09:06 lebron-2016