DECODE icon indicating copy to clipboard operation
DECODE copied to clipboard

Sample training data takes tens of minutes one epoch on Linux with an A800

Open GuohuaQiu1999 opened this issue 1 year ago • 0 comments

When using version 0.10.2 of decode and the same parameter.yaml file, the time to sample training data on Linux with an A800 is significantly larger than on Windows with GTX3080Ti .Both using the parameters below for simulation,

Hardware:
  device: cuda:0
  device_ix: 0
  device_simulation: cuda:0
  num_worker_train: 1
  torch_multiprocessing_sharing_strategy: null
  torch_threads: 4
  unix_niceness: 0
Simulation:
  bg_uniform:
  - 40.0
  - 60.0
  density: null
  emitter_av: 250
  emitter_extent:
  - - -0.5
    - 63.5
  - - -0.5
    - 63.5
  - - -2000
    - 2000
  img_size:
  - 64
  - 64
  intensity_mu_sig:
  - 3000.0
  - 100.0

On Windows with GTX3080Ti, the time to sample training data per epoch during training is about 8 seconds. However, on Linux with an A800, it takes tens of minutes (I didn’t wait for it to finish sampling in an epoch because it took too long). I investigated the code and added print statements at key points, and found that it was very slow at the line:

frames = self._spline_impl.forward_frames(*self.img_shape,   
                                          frame_ix,   
                                          n_frames,   
                                          xyz_r[:, 0],   
                                          xyz_r[:, 1],   
                                          xyz_r[:, 2],   
                                          ix[:, 0],   
                                          ix[:, 1], 
                                          weight)

However, using nvidia-smi, I saw that the GPU utilization was consistently at 100%, which is very strange. I specifically checked the spline library and found that it was compiled with sm_37. Could this be the reason for the performance issue? But sm_37 compiled code does not affect the performance on Windows with GTX 3080Ti. Recompiling to test whether it is the problem is quite difficult for me, so I hope to seek your help.

GuohuaQiu1999 avatar Nov 06 '24 19:11 GuohuaQiu1999