Ex4DGS icon indicating copy to clipboard operation
Ex4DGS copied to clipboard

Regarding the maximum number of parallel sequences

Open Tianci-Wen opened this issue 1 year ago • 2 comments

Hello! Thank you for open-sourcing this excellent work! I’d like to ask if you train multiple sequences in parallel when using a single RTX 4090, or do you train just one sequence at a time?

I noticed that the GPU utilization for each sequence is relatively low, so I run 5–6 sequences simultaneously. However, I observed that the CPU utilization becomes quite high. Could this potentially affect the final accuracy? Currently, the accuracy I achieve is slightly lower than the results reported in the paper.

In addition, when running 6 sequences in parallel, the results for some scenes are missing files such as chkpnt30000.pth, chkpnt40000.pth, and point_cloud. These results are not saved. Do you know why this might be happening?

Tianci-Wen avatar Dec 19 '24 04:12 Tianci-Wen

image

Tianci-Wen avatar Dec 19 '24 04:12 Tianci-Wen

Hi, the reason for the high CPU utilization is that we are loading images just in time to reduce memory usage. This creates a bottleneck on the CPU when running many sequences at once. I only ran one job per GPU, and I don't recommend running multiple jobs on a single GPU (it can consume more than 10GB of GPU memory and kill the job). I think the reason the results are not saved is that the jobs are killed by OOM, so they are not saved.

This is unlikely to affect performance, and I've seen some differences in the current preprocessing results and am working to reduce them.

juno181 avatar Dec 31 '24 10:12 juno181