nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

When testing with the "# Train model ns-train nerfacto --data data/nerfstudio/poster"

Open za970120604 opened this issue 2 years ago • 1 comments

(nerfstudio) undergrad@Enigma:~/nerfstudio$ ns-train nerfacto --data data/nerfstudio/poster [01:24:43] Using --data alias for --data.pipeline.datamanager.dataparser.data train.py:222 ──────────────────────────────────────────────────────── Config ──────────────────────────────────────────────────────── TrainerConfig( _target=<class 'nerfstudio.engine.trainer.Trainer'>, output_dir=PosixPath('outputs'), method_name='nerfacto', experiment_name=None, timestamp='2023-02-10_012443', machine=MachineConfig(seed=42, num_gpus=1, num_machines=1, machine_rank=0, dist_url='auto'), logging=LoggingConfig( relative_log_dir=PosixPath('.'), steps_per_log=10, max_buffer_size=20, local_writer=LocalWriterConfig( _target=<class 'nerfstudio.utils.writer.LocalWriter'>, enable=True, stats_to_track=( <EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>, <EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>, <EventName.CURR_TEST_PSNR: 'Test PSNR'>, <EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>, <EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>, <EventName.ETA: 'ETA (time)'> ), max_log_size=10 ), enable_profiler=True ), viewer=ViewerConfig( relative_log_filename='viewer_log_filename.txt', start_train=True, zmq_port=None, launch_bridge_server=True, websocket_port=7007, ip_address='127.0.0.1', num_rays_per_chunk=32768, max_num_display_images=512, quit_on_train_completion=False, skip_openrelay=False, codec='VP8', local=False ), pipeline=VanillaPipelineConfig( _target=<class 'nerfstudio.pipelines.base_pipeline.VanillaPipeline'>, datamanager=VanillaDataManagerConfig( _target=<class 'nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager'>, dataparser=NerfstudioDataParserConfig( _target=<class 'nerfstudio.data.dataparsers.nerfstudio_dataparser.Nerfstudio'>, data=PosixPath('data/nerfstudio/poster'), scale_factor=1.0, downscale_factor=None, scene_scale=1.0, orientation_method='up', center_poses=True, auto_scale_poses=True, train_split_percentage=0.9, depth_unit_scale_factor=0.001 ), train_num_rays_per_batch=4096, train_num_images_to_sample_from=-1, train_num_times_to_repeat_images=-1, eval_num_rays_per_batch=4096, eval_num_images_to_sample_from=-1, eval_num_times_to_repeat_images=-1, eval_image_indices=(0,), camera_optimizer=CameraOptimizerConfig( _target=<class 'nerfstudio.cameras.camera_optimizers.CameraOptimizer'>, mode='SO3xR3', position_noise_std=0.0, orientation_noise_std=0.0, optimizer=AdamOptimizerConfig( _target=<class 'torch.optim.adam.Adam'>, lr=0.0006, eps=1e-08, max_norm=None, weight_decay=0.01 ), scheduler=SchedulerConfig( _target=<class 'nerfstudio.engine.schedulers.ExponentialDecaySchedule'>, lr_final=5e-06, max_steps=10000 ), param_group='camera_opt' ), camera_res_scale_factor=1.0 ), model=NerfactoModelConfig( _target=<class 'nerfstudio.models.nerfacto.NerfactoModel'>, enable_collider=True, collider_params={'near_plane': 2.0, 'far_plane': 6.0}, loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0}, eval_num_rays_per_chunk=32768, near_plane=0.05, far_plane=1000.0, background_color='last_sample', num_levels=16, max_res=2048, log2_hashmap_size=19, num_proposal_samples_per_ray=(256, 96), num_nerf_samples_per_ray=48, proposal_update_every=5, proposal_warmup=5000, num_proposal_iterations=2, use_same_proposal_network=False, proposal_net_args_list=[ {'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 128}, {'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 256} ], interlevel_loss_mult=1.0, distortion_loss_mult=0.002, orientation_loss_mult=0.0001, pred_normal_loss_mult=0.001, use_proposal_weight_anneal=True, use_average_appearance_embedding=True, proposal_weights_anneal_slope=10.0, proposal_weights_anneal_max_num_iters=1000, use_single_jitter=True, predict_normals=False ) ), optimizers={ 'proposal_networks': { 'optimizer': AdamOptimizerConfig( _target=<class 'torch.optim.adam.Adam'>, lr=0.01, eps=1e-15, max_norm=None, weight_decay=0 ), 'scheduler': None }, 'fields': { 'optimizer': AdamOptimizerConfig( _target=<class 'torch.optim.adam.Adam'>, lr=0.01, eps=1e-15, max_norm=None, weight_decay=0 ), 'scheduler': None } }, vis='viewer', data=PosixPath('data/nerfstudio/poster'), relative_model_dir=PosixPath('nerfstudio_models'), steps_per_save=2000, steps_per_eval_batch=500, steps_per_eval_image=500, steps_per_eval_all_images=25000, max_num_iterations=30000, mixed_precision=True, save_only_latest_checkpoint=True, load_dir=None, load_step=None, load_config=None, log_gradients=False ) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── [01:24:43] Saving config to: experiment_config.py:124 outputs/data-nerfstudio-poster/nerfacto/2023-02-10_012443/config.yml
[01:24:43] Saving checkpoints to: trainer.py:123 outputs/data-nerfstudio-poster/nerfacto/2023-02-10_012443/nerfstudio_models
Using ZMQ port: 34705

======================================================================================================================== [Public] Open the viewer at https://viewer.nerf.studio/versions/23-02-3-0/?websocket_url=ws://localhost:7007

Sending ping to the viewer Bridge Server... Successfully connected. Sending ping to the viewer Bridge Server... Successfully connected. [NOTE] Not running eval iterations since only viewer is enabled. Use --vis wandb or --vis tensorboard to run with eval instead. Disabled tensorboard/wandb event writers [01:24:43] Auto image downscale factor of 2 nerfstudio_dataparser.py:314 Skipping 0 files in dataset split train. nerfstudio_dataparser.py:165 Skipping 0 files in dataset split val. nerfstudio_dataparser.py:165 Setting up training dataset... Caching all 204 images. Setting up evaluation dataset... Caching all 22 images. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. No checkpoints to load, training from scratch Printing profiling stats, from longest to shortest duration in seconds Traceback (most recent call last): File "/home/undergrad/anaconda3/envs/nerfstudio/bin/ns-train", line 8, in sys.exit(entrypoint()) File "/home/undergrad/nerfstudio/scripts/train.py", line 247, in entrypoint main( File "/home/undergrad/nerfstudio/scripts/train.py", line 233, in main launch( File "/home/undergrad/nerfstudio/scripts/train.py", line 172, in launch main_func(local_rank=0, world_size=world_size, config=config) File "/home/undergrad/nerfstudio/scripts/train.py", line 87, in train_loop trainer.train() File "/home/undergrad/nerfstudio/nerfstudio/engine/trainer.py", line 203, in train loss, loss_dict, metrics_dict = self.train_iteration(step) File "/home/undergrad/nerfstudio/nerfstudio/utils/profiler.py", line 43, in wrapper ret = func(*args, **kwargs) File "/home/undergrad/nerfstudio/nerfstudio/engine/trainer.py", line 371, in train_iteration _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step) File "/home/undergrad/nerfstudio/nerfstudio/utils/profiler.py", line 43, in wrapper ret = func(*args, **kwargs) File "/home/undergrad/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 255, in get_train_loss_dict ray_bundle, batch = self.datamanager.next_train(step) File "/home/undergrad/nerfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line 418, in next_train ray_bundle = self.train_ray_generator(ray_indices) File "/home/undergrad/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/undergrad/nerfstudio/nerfstudio/model_components/ray_generators.py", line 52, in forward camera_opt_to_camera = self.pose_optimizer(c) File "/home/undergrad/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/undergrad/nerfstudio/nerfstudio/cameras/camera_optimizers.py", line 116, in forward outputs.append(exp_map_SO3xR3(self.pose_adjustment[indices, :])) File "/home/undergrad/nerfstudio/nerfstudio/cameras/lie_groups.py", line 47, in exp_map_SO3xR3 skews_square = torch.bmm(skews, skews) RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

Does this means that there's something wrong with the package dependencies ? Thanks for your help!

za970120604 avatar Feb 09 '23 17:02 za970120604

Looks like a pytorch cuda error. Maybe try reinstalling pytorch and checking your cuda version. https://github.com/pytorch/pytorch/issues/20860

tancik avatar Feb 10 '23 01:02 tancik