stable-dreamfusion
stable-dreamfusion copied to clipboard
NaN or Inf found in input tensor.
Description
(venv) kai@ns-staging:~/workspace/stable-dreamfusion$ python main.py --text "A red dinosaur in boots." --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000
Namespace(file=None, text='A red dinosaur in boots.', negative='', O=True, O2=False, test=False, six_views=False, eval_interval=1, test_interval=100, workspace='/var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k', seed=None, image=None, image_config=None, known_view_interval=4, IF=False, guidance=['SD'], guidance_scale=100, save_mesh=False, mcubes_resolution=256, decimate_target=50000.0, dmtet=False, tet_grid_size=128, init_with='', lock_geo=False, perpneg=False, negative_w=-2, front_decay_factor=2, side_decay_factor=10, iters=30000, lr=0.001, ckpt='latest', cuda_ray=True, taichi_ray=False, max_steps=1024, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, latent_iter_ratio=0.2, albedo_iter_ratio=0, min_ambient_ratio=0.1, textureless_ratio=0.2, jitter_pose=False, jitter_center=0.2, jitter_target=0.2, jitter_up=0.02, uniform_sphere_rate=0, grad_clip=-1, grad_clip_rgb=-1, bg_radius=1.4, density_activation='exp', density_thresh=10, blob_density=5, blob_radius=0.2, backbone='grid', optim='adan', sd_version='2.1', hf_key=None, fp16=True, vram_O=False, w=64, h=64, known_view_scale=1.5, known_view_noise_scale=0.002, dmtet_reso_scale=8, batch_size=1, bound=1, dt_gamma=0, min_near=0.01, radius_range=[3.0, 3.5], theta_range=[45, 105], phi_range=[-180, 180], fovy_range=[10, 30], default_radius=3.2, default_polar=90, default_azimuth=0, default_fovy=20, progressive_view=False, progressive_view_init_ratio=0.2, progressive_level=False, angle_overhead=30, angle_front=60, t_range=[0.02, 0.98], dont_override_stuff=False, lambda_entropy=0.001, lambda_opacity=0, lambda_orient=0.01, lambda_tv=0, lambda_wd=0, lambda_mesh_normal=0.5, lambda_mesh_laplacian=0.5, lambda_guidance=1, lambda_rgb=1000, lambda_mask=500, lambda_normal=0, lambda_depth=10, lambda_2d_normal_smooth=0, lambda_3d_normal_smooth=0, save_guidance=False, save_guidance_interval=10, gui=False, W=800, H=800, radius=5, fovy=20, light_theta=60, light_phi=0, max_spp=1, zero123_config='./pretrained/zero123/sd-objaverse-finetune-c_concat-256.yaml', zero123_ckpt='./pretrained/zero123/105000.ckpt', zero123_grad_scale='angle', dataset_size_train=100, dataset_size_valid=8, dataset_size_test=100, exp_start_iter=0, exp_end_iter=30000, images=None, ref_radii=[], ref_polars=[], ref_azimuths=[], zero123_ws=[], default_zero123_w=1)
NeRFNetwork(
(encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(6098120, 2) gridtype=hash align_corners=False interpolation=smoothstep
(sigma_net): MLP(
(net): ModuleList(
(0): Linear(in_features=32, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
(2): Linear(in_features=64, out_features=4, bias=True)
)
)
(encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39
(bg_net): MLP(
(net): ModuleList(
(0): Linear(in_features=39, out_features=32, bias=True)
(1): Linear(in_features=32, out_features=3, bias=True)
)
)
)
[INFO] loading stable diffusion...
[INFO] loaded stable diffusion!
[INFO] Cmdline: main.py --text A red dinosaur in boots. --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000
[INFO] opt: Namespace(file=None, text='A red dinosaur in boots.', negative='', O=True, O2=False, test=False, six_views=False, eval_interval=1, test_interval=100,
workspace='/var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k', seed=None, image=None, image_config=None, known_view_interval=4, IF=False, guidance=['SD'], guidance_scale=100, save_mesh=False,
mcubes_resolution=256, decimate_target=50000.0, dmtet=False, tet_grid_size=128, init_with='', lock_geo=False, perpneg=False, negative_w=-2, front_decay_factor=2, side_decay_factor=10, iters=30000, lr=0.001,
ckpt='latest', cuda_ray=True, taichi_ray=False, max_steps=1024, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, latent_iter_ratio=0.2, albedo_iter_ratio=0, min_ambient_ratio=0.1,
textureless_ratio=0.2, jitter_pose=False, jitter_center=0.2, jitter_target=0.2, jitter_up=0.02, uniform_sphere_rate=0, grad_clip=-1, grad_clip_rgb=-1, bg_radius=1.4, density_activation='exp', density_thresh=10,
blob_density=5, blob_radius=0.2, backbone='grid', optim='adan', sd_version='2.1', hf_key=None, fp16=True, vram_O=False, w=64, h=64, known_view_scale=1.5, known_view_noise_scale=0.002, dmtet_reso_scale=8,
batch_size=1, bound=1, dt_gamma=0, min_near=0.01, radius_range=[3.0, 3.5], theta_range=[45, 105], phi_range=[-180, 180], fovy_range=[10, 30], default_radius=3.2, default_polar=90, default_azimuth=0,
default_fovy=20, progressive_view=False, progressive_view_init_ratio=0.2, progressive_level=False, angle_overhead=30, angle_front=60, t_range=[0.02, 0.98], dont_override_stuff=False, lambda_entropy=0.001,
lambda_opacity=0, lambda_orient=0.01, lambda_tv=0, lambda_wd=0, lambda_mesh_normal=0.5, lambda_mesh_laplacian=0.5, lambda_guidance=1, lambda_rgb=1000, lambda_mask=500, lambda_normal=0, lambda_depth=10,
lambda_2d_normal_smooth=0, lambda_3d_normal_smooth=0, save_guidance=False, save_guidance_interval=10, gui=False, W=800, H=800, radius=5, fovy=20, light_theta=60, light_phi=0, max_spp=1,
zero123_config='./pretrained/zero123/sd-objaverse-finetune-c_concat-256.yaml', zero123_ckpt='./pretrained/zero123/105000.ckpt', zero123_grad_scale='angle', dataset_size_train=100, dataset_size_valid=8,
dataset_size_test=100, exp_start_iter=0, exp_end_iter=30000, images=None, ref_radii=[], ref_polars=[], ref_azimuths=[], zero123_ws=[], default_zero123_w=1)
[INFO] Trainer: df | 2023-07-17_21-08-20 | cuda | fp16 | /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k
[INFO] #parameters: 12204151
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
......
==> [2023-07-17_21-23-40] Start Training /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k Epoch 81/300, lr=0.050000 ...
loss=1.0000 (1.0000), lr=0.050000: : 100% 100/100 [00:18<00:00, 5.36it/s]
==> [2023-07-17_21-23-59] Finished Epoch 81/300. CPU=3.9GB, GPU=8.0GB.
++> Evaluate /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k at epoch 81 ...
loss=0.0000 (0.0000): : 100% 8/8 [00:00<00:00, 53.78it/s]
++> Evaluate epoch 81 Finished.
==> [2023-07-17_21-23-59] Start Training /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k Epoch 82/300, lr=0.050000 ...
loss=1.0000 (1.0000), lr=0.050000: : 50% 50/100 [00:09<00:09, 5.39it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 51% 51/100 [00:09<00:09, 5.36it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 52% 52/100 [00:09<00:08, 5.35it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 53% 53/100 [00:09<00:08, 5.35it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 54% 54/100 [00:10<00:08, 5.34it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 55% 55/100 [00:10<00:08, 5.36it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 56% 56/100 [00:10<00:08, 5.33it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 57% 57/100 [00:10<00:08, 5.33it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 58% 58/100 [00:10<00:07, 5.36it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 59% 59/100 [00:10<00:07, 5.34it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: : 60% 60/100 [00:11<00:07, 5.35it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/kai/workspace/stable-dreamfusion/main.py:410 in <module> │
│ │
│ 407 │ │ │ test_loader = NeRFDataset(opt, device=device, type='test', H=opt.H, W=opt.W, │
│ 408 │ │ │ │
│ 409 │ │ │ max_epoch = np.ceil(opt.iters / len(train_loader)).astype(np.int32) │
│ ❱ 410 │ │ │ trainer.train(train_loader, valid_loader, test_loader, max_epoch) │
│ 411 │ │ │ │
│ 412 │ │ │ if opt.save_mesh: │
│ 413 │ │ │ │ trainer.save_mesh() │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:812 in train │
│ │
│ 809 │ │ for epoch in range(self.epoch + 1, max_epochs + 1): │
│ 810 │ │ │ self.epoch = epoch │
│ 811 │ │ │ │
│ ❱ 812 │ │ │ self.train_one_epoch(train_loader, max_epochs) │
│ 813 │ │ │ │
│ 814 │ │ │ if self.workspace is not None and self.local_rank == 0: │
│ 815 │ │ │ │ self.save_checkpoint(full=True, best=False) │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:1049 in train_one_epoch │
│ │
│ 1046 │ │ │ │ │ save_guidance_path = save_guidance_folder / f'step_{self.global_step │
│ 1047 │ │ │ │ else: │
│ 1048 │ │ │ │ │ save_guidance_path = None │
│ ❱ 1049 │ │ │ │ pred_rgbs, pred_depths, loss = self.train_step(data, save_guidance_path= │
│ 1050 │ │ │ │
│ 1051 │ │ │ # hooked grad clipping for RGB space │
│ 1052 │ │ │ if self.opt.grad_clip_rgb >= 0: │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:537 in train_step │
│ │
│ 534 │ │ │ else: │
│ 535 │ │ │ │ bg_color = torch.rand(3).to(self.device) # single color random bg │
│ 536 │ │ │
│ ❱ 537 │ │ outputs = self.model.render(rays_o, rays_d, mvp, H, W, staged=False, perturb=Tru │
│ 538 │ │ pred_depth = outputs['depth'].reshape(B, 1, H, W) │
│ 539 │ │ pred_mask = outputs['weights_sum'].reshape(B, 1, H, W) │
│ 540 │ │ if 'normal_image' in outputs: │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/renderer.py:1163 in render │
│ │
│ 1160 │ │ if self.dmtet: │
│ 1161 │ │ │ results = self.run_dmtet(rays_o, rays_d, mvp, h, w, **kwargs) │
│ 1162 │ │ elif self.cuda_ray: │
│ ❱ 1163 │ │ │ results = self.run_cuda(rays_o, rays_d, **kwargs) │
│ 1164 │ │ elif self.taichi_ray: │
│ 1165 │ │ │ results = self.run_taichi(rays_o, rays_d, **kwargs) │
│ 1166 │ │ else: │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/renderer.py:739 in run_cuda │
│ │
│ 736 │ │ │ │ flatten_rays = raymarching.flatten_rays(rays, xyzs.shape[0]).long() │
│ 737 │ │ │ │ light_d = light_d[flatten_rays] │
│ 738 │ │ │ │
│ ❱ 739 │ │ │ sigmas, rgbs, normals = self(xyzs, dirs, light_d, ratio=ambient_ratio, shadi │
│ 740 │ │ │ weights, weights_sum, depth, image = raymarching.composite_rays_train(sigmas │
│ 741 │ │ │ │
│ 742 │ │ │ # normals related regularizations │
│ │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │
│ .py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:110 in forward │
│ │
│ 107 │ │ # l: [3], plane light direction, nomalized in [-1, 1] │
│ 108 │ │ # ratio: scalar, ambient ratio, 1 == no shading (albedo only), 0 == only shading │
│ 109 │ │ │
│ ❱ 110 │ │ sigma, albedo = self.common_forward(x) │
│ 111 │ │ │
│ 112 │ │ if shading == 'albedo': │
│ 113 │ │ │ normal = None │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:73 in common_forward │
│ │
│ 70 │ │ # sigma │
│ 71 │ │ enc = self.encoder(x, bound=self.bound, max_level=self.max_level) │
│ 72 │ │ │
│ ❱ 73 │ │ h = self.sigma_net(enc) │
│ 74 │ │ │
│ 75 │ │ sigma = self.density_activation(h[..., 0] + self.density_blob(x)) │
│ 76 │ │ albedo = torch.sigmoid(h[..., 1:]) │
│ │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │
│ .py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:29 in forward │
│ │
│ 26 │ │
│ 27 │ def forward(self, x): │
│ 28 │ │ for l in range(self.num_layers): │
│ ❱ 29 │ │ │ x = self.net[l](x) │
│ 30 │ │ │ if l != self.num_layers - 1: │
│ 31 │ │ │ │ x = F.relu(x, inplace=True) │
│ 32 │ │ return x │
│ │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │
│ .py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/linear │
│ .py:114 in forward │
│ │
│ 111 │ │ │ init.uniform_(self.bias, -bound, bound) │
│ 112 │ │
│ 113 │ def forward(self, input: Tensor) -> Tensor: │
│ ❱ 114 │ │ return F.linear(input, self.weight, self.bias) │
│ 115 │ │
│ 116 │ def extra_repr(self) -> str: │
│ 117 │ │ return 'in_features={}, out_features={}, bias={}'.format( │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
loss=nan (nan), lr=0.050000: : 60% 60/100 [00:11<00:07, 5.19it/s]
Steps to Reproduce
python main.py --text "A red dinosaur in boots." --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000
Expected Behavior
no crash.
Environment
Ubuntu 22.02 / PyTorch 2.0.1 / CUDA 11.7
try disable "--cuda_ray‘’, I solve this issue with it. I guess that it happened with enable "--cuda_ray‘’ and '--fp16' together,cause CUDA raymarching calculate tensor error, but pytorch is OK.