localrf icon indicating copy to clipboard operation
localrf copied to clipboard

I am training indoor scene, training takes forever

Open angrysword opened this issue 2 years ago • 6 comments

Thanks for very cool demo.

I am training the indoor scene with 1100 frame in the dataset. with rtx 4090. 3dscene dir is just indoor 1100 images. did I do anything wrong. have trained a couple of hours, every thing looks good. but when it will done. python localTensoRF/train.py --datadir c:/data/3dscene --logdir c:/data/3dscene/log --fov 60

Namespace(L1_weight=0.01, N_voxel_final=262144000, N_voxel_init=262144, TV_weight_app=0.0, TV_weight_density=0.0, add_frames_every=100, alpha_mask_thre=0.0001, batch_size=4096, ckpt=None, config=None, data_dim_color=27, datadir='c:/data/3dscene', density_shift=-5, device='cuda:0', distance_scale=25, downsampling=-1, export_mesh=0, fea2denseAct='softplus', fea_pe=0, featureC=128, fov=60.0, logdir='c:/data/3dscene/log', loss_depth_weight_inital=0.1, loss_flow_weight_inital=1, lr_R_init=0.005, lr_basis=0.001, lr_decay_target_ratio=0.1, lr_exposure_init=0.001, lr_i_init=0, lr_init=0.02, lr_t_init=0.0005, lr_upsample_reset=1, max_drift=1, model_name='TensorVMSplit', nSamples=1000000.0, n_init_frames=5, n_iters_per_frame=600, n_iters_reg=300, n_lamb_sh=[24, 24, 24], n_lamb_sigma=[8, 8, 8], n_max_frames=100, n_overlap=30, pos_pe=0, progress_refresh_rate=200, render_only=0, render_path=1, render_test=1, rm_weight_mask_thre=0.001, shadingMode='MLP_Fea_late_view', step_ratio=0.5, subsequence=[0, -1], update_AlphaMask_list=[300], upsamp_list=[100, 150, 200, 250, 300], view_pe=0, vis_every=10000, with_GT_poses=0) lr decay 0.1 aabb tensor([-2., -2., -2., 2., 2., 2.], device='cuda:0') grid size [64, 64, 64] sampling step size: tensor(0.0317, device='cuda:0') sampling number: 219 pos_pe 0 view_pe 0 fea_pe 0 MLPRender_Fea_late_view( (mlp): Sequential( (0): Linear(in_features=27, out_features=128, bias=True) (1): ReLU(inplace=True) (2): Linear(in_features=128, out_features=128, bias=True) (3): ReLU(inplace=True) ) (mlp_view): Sequential( (0): Linear(in_features=131, out_features=3, bias=True) ) ) Iteration 000000: 0.47 it/s Iteration 000200: 21.43 it/s Iteration 000400: 22.71 it/s

aabb tensor([-2., -2., -2., 2., 2., 2.], device='cuda:0') grid size [640, 640, 640] sampling step size: tensor(0.0031, device='cuda:0') sampling number: 2214 upsamping to [640, 640, 640] reset lr to initial alpha rest %10.372521 Iteration 073600: 14.53 it/s Iteration 073800: 17.85 it/s Iteration 074000: 17.30 it/s Iteration 074200: 14.65 it/s Iteration 074400: 15.19 it/s Iteration 074600: 14.89 it/s Iteration 074800: 17.70 it/s Iteration 075000: 18.02 it/s Iteration 075200: 18.01 it/s

angrysword avatar Jun 17 '23 23:06 angrysword

I have the same case with the forest1. However, I checked the paper: image So it will take more than a day ( or 2 days ) to train.

Khoa-NT avatar Jun 19 '23 02:06 Khoa-NT

if that is the case, I have to give up. the cool demo seems only take 3 minutes to train. It claim it is 100 times fast.

Hi, which cool demo only needs 3 minutes to train, I didn't understand what you mean?

tb2-sy avatar Jun 19 '23 11:06 tb2-sy

I have the same case with the forest1. However, I checked the paper: image So it will take more than a day ( or 2 days ) to train.

Hi, I also encountered the same problem. I tried another scene (university1) with fewer pictures, only 804 pictures, but I didn’t finish running train.py after using A40 for two days.

tb2-sy avatar Jun 19 '23 11:06 tb2-sy

I have the same case with the forest1. However, I checked the paper: image So it will take more than a day ( or 2 days ) to train.

Hi, I also encountered the same problem. I tried another scene (university1) with fewer pictures, only 804 pictures, but I didn’t finish running train.py after using A40 for two days.

Hi, my training on the forest1 just finished. It took me more than 2 days to train on the A6000 GPU.

Khoa-NT avatar Jun 20 '23 17:06 Khoa-NT

So, if I understand, training takes so long, but how about predictions? I'm actually interested in these architectures because I wanted to reduce the processing time for scene reconstruction. Using SfM and MVS with COLMAP already takes hundreds of minutes for camera poses estimation and dense reconstruction.

Yarroudh avatar Jun 21 '23 00:06 Yarroudh

Nvidia's instant ngp reconstruction only takes a few minutes, but colmap is used to estimate the camera's position and pose in the front. The total time is much shorter than this, but Floater 'disease often occurs because of the position and pose estimation error. A6000 machine is training Forest 1, just over 300 frames. If the effect is good, it's okay. If the effect is not good, it's a pity. Such a long time is definitely not practical. Who has a way to reduce training time.

Dragonkingpan avatar Jun 21 '23 05:06 Dragonkingpan

I used a single RTX4090, training takes me 30 hours. Indeed not practical.

DeepJackNotFound avatar Jun 25 '23 01:06 DeepJackNotFound

I added parameters to speed-up optimization. They are described here.

ameuleman avatar Jul 04 '23 11:07 ameuleman