nerfstudio
nerfstudio copied to clipboard
Training does not use the Nvidia GPU, is it normal?
When running ns-train nerfacto --data data/nerfstudio/poster
, only the CPU of my laptop is getting some load.
According to the task manager, the load of my Nvidia GPU is 0%.
As a result, the training is quite slow, and I’m actually surprised that it works at all, considering CUDA was supposed to be required.
Am I missing something? What info can I provide to help figure this one out?
OS : Windows 10
GPU : Nvidia RTX 3050 laptop
Cuda : version returned by nvcc cuda_11.8.r11.8/compiler.31833905_0
Installation was done in a conda env as per the doc. Nerfstudio, installed nerfstudio==0.1.10
with pip
It is not normal for the GPU to not work. I'm actually surprised that it is working at all given that TinyCudaNN requires a GPU. Can you post the full logs before training starts.
Sure
Logs before training starts
❯ ns-train nerfacto --data data/nerfstudio/poster
[17:52:51] Using --data alias for --data.pipeline.datamanager.dataparser.data train.py:223
──────────────────────────────────────────────────────── Config ────────────────────────────────────────────────────────
Config(
output_dir=WindowsPath('outputs'),
method_name='nerfacto',
experiment_name=None,
timestamp='2022-11-21_175251',
machine=MachineConfig(seed=42, num_gpus=1, num_machines=1, machine_rank=0, dist_url='auto'),
logging=LoggingConfig(
relative_log_dir=WindowsPath('.'),
steps_per_log=10,
max_buffer_size=20,
local_writer=LocalWriterConfig(
_target=<class 'nerfstudio.utils.writer.LocalWriter'>,
enable=True,
stats_to_track=(
<EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>,
<EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>,
<EventName.CURR_TEST_PSNR: 'Test PSNR'>,
<EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>,
<EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>
),
max_log_size=10
),
enable_profiler=True
),
viewer=ViewerConfig(
relative_log_filename='viewer_log_filename.txt',
start_train=True,
zmq_port=None,
launch_bridge_server=True,
websocket_port=7007,
ip_address='127.0.0.1',
num_rays_per_chunk=32768,
max_num_display_images=512,
quit_on_train_completion=False
),
trainer=TrainerConfig(
steps_per_save=2000,
steps_per_eval_batch=500,
steps_per_eval_image=500,
steps_per_eval_all_images=25000,
max_num_iterations=30000,
mixed_precision=True,
relative_model_dir=WindowsPath('nerfstudio_models'),
save_only_latest_checkpoint=True,
load_dir=None,
load_step=None,
load_config=None
),
pipeline=VanillaPipelineConfig(
_target=<class 'nerfstudio.pipelines.base_pipeline.VanillaPipeline'>,
datamanager=VanillaDataManagerConfig(
_target=<class 'nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager'>,
dataparser=NerfstudioDataParserConfig(
_target=<class 'nerfstudio.data.dataparsers.nerfstudio_dataparser.Nerfstudio'>,
data=WindowsPath('data/nerfstudio/poster'),
scale_factor=1.0,
downscale_factor=None,
scene_scale=1.0,
orientation_method='up',
center_poses=True,
auto_scale_poses=True,
train_split_percentage=0.9
),
train_num_rays_per_batch=4096,
train_num_images_to_sample_from=-1,
train_num_times_to_repeat_images=-1,
eval_num_rays_per_batch=4096,
eval_num_images_to_sample_from=-1,
eval_num_times_to_repeat_images=-1,
eval_image_indices=(0,),
camera_optimizer=CameraOptimizerConfig(
_target=<class 'nerfstudio.cameras.camera_optimizers.CameraOptimizer'>,
mode='SO3xR3',
position_noise_std=0.0,
orientation_noise_std=0.0,
optimizer=AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.0006,
eps=1e-08,
weight_decay=0.01
),
scheduler=SchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecaySchedule'>,
lr_final=5e-06,
max_steps=10000
),
param_group='camera_opt'
)
),
model=NerfactoModelConfig(
_target=<class 'nerfstudio.models.nerfacto.NerfactoModel'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=32768,
near_plane=0.05,
far_plane=1000.0,
background_color='last_sample',
num_proposal_samples_per_ray=(256, 96),
num_nerf_samples_per_ray=48,
proposal_update_every=5,
proposal_warmup=5000,
num_proposal_iterations=2,
use_same_proposal_network=False,
proposal_net_args_list=[
{'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 64},
{'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 256}
],
interlevel_loss_mult=1.0,
distortion_loss_mult=0.002,
use_proposal_weight_anneal=True,
use_average_appearance_embedding=True,
proposal_weights_anneal_slope=10.0,
proposal_weights_anneal_max_num_iters=1000,
use_single_jitter=True
)
),
optimizers={
'proposal_networks': {
'optimizer': AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.01,
eps=1e-15,
weight_decay=0
),
'scheduler': None
},
'fields': {
'optimizer': AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.01,
eps=1e-15,
weight_decay=0
),
'scheduler': None
}
},
vis='viewer',
data=WindowsPath('data/nerfstudio/poster')
)
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[17:52:51] Saving config to: outputs\data\nerfstudio\poster\nerfacto\2022-11-21_175251\config.yml base_config.py:274
[17:52:51] Saving checkpoints to: trainer.py:90
outputs\data\nerfstudio\poster\nerfacto\2022-11-21_175251\nerfstudio_models
Using ZMQ port: 51327
========================================================================================================================
[Public] Open the viewer at https://viewer.nerf.studio/versions/22-11-10-0/?websocket_url=ws://localhost:7007
========================================================================================================================
Sending ping to the viewer Bridge Server...
Successfully connected.
Sending ping to the viewer Bridge Server...
Successfully connected.
[WARNING] Not running eval iterations since only viewer is enabled. Use `--vis wandb` or `--vis tensorboard` to run with
eval instead.
disabled tensorboard/wandb event writers
[17:52:52] Auto image downscale factor of 2 nerfstudio_dataparser.py:202
Skipping 0 files in dataset split train. nerfstudio_dataparser.py:91
Auto image downscale factor of 2 nerfstudio_dataparser.py:202
Skipping 0 files in dataset split val. nerfstudio_dataparser.py:91
Setting up training dataset...
Caching all 204 images.
Loading data batch ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Setting up evaluation dataset...
Caching all 22 images.
Loading data batch ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
No checkpoints to load, training from scratch
[17:53:18] Printing max of 10 lines. Set flag `--logging.local-writer.max-log-size=0` to disable line writer.py:388
wrapping.
testeing testing!
Are you sure your GPU isn't being used? Your training time per iteration is ~90ms; for reference I'm using an NVIDIA Tesla (low-end) and it takes about ~250ms
Are you sure your GPU isn't being used? Your training time per iteration is ~90ms; for reference I'm using an NVIDIA Titan 100 (low-end) and it takes about ~250ms
So it could be windows task manager not reporting GPU usage correctly? I’ll double check when I get the time. Today or next week.
vram
@Jordan-Pierce it seems like you are right. It is an issue with windows task manager reporting. I just tried with nvidia-smi during the training, and the GPU is at 70% load, and not 0%.