nerfstudio
nerfstudio copied to clipboard
Docker image: ns-process-data images error
Describe the bug ns-process-data images does not work anymore in docker image ghcr.io/nerfstudio-project/nerfstudio:latest it creates /processed/001/colmap /processed/001/images /processed/001/images_2 /processed/001/images_4 /processed/001/images_8 in the target directory but there is an error and consecutive ns-train nerfacto --data /workspace/processed/001/ fails.
I have no name!@224b575af291://$ ns-process-data images --data /workspace/input/ --output-dir /workspace/processed/001/
Matplotlib created a temporary cache directory at /tmp/matplotlib-ox6qfncq because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
[12:14:17] π Done copying images with prefix 'frame_'. process_data_utils.py:340
π Done extracting COLMAP features. colmap_utils.py:137
Traceback (most recent call last):
File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share/nerfstudio'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/ns-process-data", line 8, in <module>
sys.exit(entrypoint())
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/process_data.py", line 551, in entrypoint
tyro.cli(Commands).main()
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/process_data/images_to_nerfstudio_dataset.py", line 114, in main
self._run_colmap()
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/process_data/colmap_converter_to_nerfstudio_dataset.py", line 214, in _run_colmap
colmap_utils.run_colmap(
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/process_data/colmap_utils.py", line 146, in run_colmap
vocab_tree_filename = get_vocab_tree()
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/process_data/colmap_utils.py", line 77, in get_vocab_tree
vocab_tree_filename.parent.mkdir(parents=True, exist_ok=True)
File "/usr/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/usr/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/.local'
To Reproduce Steps to reproduce the behavior: run ns-process-data images --data /workspace/input/ --output-dir /workspace/processed/001/
Expected behavior Processing without error
Looks like the vocab tree is being built in the current working directory. Can you mount a random temp dir to docker and cd to the mounted path before running the command?
Don't know the real fix but for anybody stuck you can bypass by running root. Start the container with the -u 0 option
What worked for me was to set the HOME env var to somewhere with write access
Sorry for the late reply, totally missed it and gave up.
@jkulhanek
Not sure if I understand that: I created a directory /tmp/nerfstudiotmp on my host, ran nerfstudio with
sudo docker run --gpus all -u $(id -u) -v /home/ahfabi/Documents/nerfstudio/:/workspace/ -v /home/ahfabi/.cache/:/home/user/.cache/ -p 7007:7007 --rm -it --shm-size=12gb --mount type=tmpfs,destination=/tmp/nerfstudiotmp ghcr.io/nerfstudio-project/nerfstudio:latest
Inside of docker, I opened the directory /tmp/nerfstudiotmp and ran the ns-train command there. This was the output:
/tmp/nerfstudiotmp$ ns-train nerfacto --data /workspace/processed/test
Matplotlib created a temporary cache directory at /tmp/matplotlib-axljp85h because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
[18:21:54] Using --data alias for --data.pipeline.datamanager.data train.py:230
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Config ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TrainerConfig(
_target=<class 'nerfstudio.engine.trainer.Trainer'>,
output_dir=PosixPath('outputs'),
method_name='nerfacto',
experiment_name=None,
project_name='nerfstudio-project',
timestamp='2025-02-16_182154',
machine=MachineConfig(seed=42, num_devices=1, num_machines=1, machine_rank=0, dist_url='auto', device_type='cuda'),
logging=LoggingConfig(
relative_log_dir=PosixPath('.'),
steps_per_log=10,
max_buffer_size=20,
local_writer=LocalWriterConfig(
_target=<class 'nerfstudio.utils.writer.LocalWriter'>,
enable=True,
stats_to_track=(
<EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>,
<EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>,
<EventName.CURR_TEST_PSNR: 'Test PSNR'>,
<EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>,
<EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>,
<EventName.ETA: 'ETA (time)'>
),
max_log_size=10
),
profiler='basic'
),
viewer=ViewerConfig(
relative_log_filename='viewer_log_filename.txt',
websocket_port=None,
websocket_port_default=7007,
websocket_host='0.0.0.0',
num_rays_per_chunk=32768,
max_num_display_images=512,
quit_on_train_completion=False,
image_format='jpeg',
jpeg_quality=75,
make_share_url=False,
camera_frustum_scale=0.1,
default_composite_depth=True
),
pipeline=VanillaPipelineConfig(
_target=<class 'nerfstudio.pipelines.base_pipeline.VanillaPipeline'>,
datamanager=ParallelDataManagerConfig(
_target=<class 'nerfstudio.data.datamanagers.parallel_datamanager.ParallelDataManager'>,
data=PosixPath('/workspace/processed/test'),
masks_on_gpu=False,
images_on_gpu=False,
dataparser=NerfstudioDataParserConfig(
_target=<class 'nerfstudio.data.dataparsers.nerfstudio_dataparser.Nerfstudio'>,
data=PosixPath('.'),
scale_factor=1.0,
downscale_factor=None,
scene_scale=1.0,
orientation_method='up',
center_method='poses',
auto_scale_poses=True,
eval_mode='fraction',
train_split_fraction=0.9,
eval_interval=8,
depth_unit_scale_factor=0.001,
mask_color=None,
load_3D_points=False
),
train_num_rays_per_batch=4096,
train_num_images_to_sample_from=-1,
train_num_times_to_repeat_images=-1,
eval_num_rays_per_batch=4096,
eval_num_images_to_sample_from=-1,
eval_num_times_to_repeat_images=-1,
eval_image_indices=(0,),
collate_fn=<function nerfstudio_collate at 0x7b7950dfe200>,
camera_res_scale_factor=1.0,
patch_size=1,
camera_optimizer=None,
pixel_sampler=PixelSamplerConfig(
_target=<class 'nerfstudio.data.pixel_samplers.PixelSampler'>,
num_rays_per_batch=4096,
keep_full_image=False,
is_equirectangular=False,
ignore_mask=False,
fisheye_crop_radius=None,
rejection_sample_mask=True,
max_num_iterations=100
),
num_processes=1,
queue_size=2,
max_thread_workers=None
),
model=NerfactoModelConfig(
_target=<class 'nerfstudio.models.nerfacto.NerfactoModel'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=32768,
prompt=None,
near_plane=0.05,
far_plane=1000.0,
background_color='last_sample',
hidden_dim=64,
hidden_dim_color=64,
hidden_dim_transient=64,
num_levels=16,
base_res=16,
max_res=2048,
log2_hashmap_size=19,
features_per_level=2,
num_proposal_samples_per_ray=(256, 96),
num_nerf_samples_per_ray=48,
proposal_update_every=5,
proposal_warmup=5000,
num_proposal_iterations=2,
use_same_proposal_network=False,
proposal_net_args_list=[
{'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 128, 'use_linear': False},
{'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 256, 'use_linear': False}
],
proposal_initial_sampler='piecewise',
interlevel_loss_mult=1.0,
distortion_loss_mult=0.002,
orientation_loss_mult=0.0001,
pred_normal_loss_mult=0.001,
use_proposal_weight_anneal=True,
use_appearance_embedding=True,
use_average_appearance_embedding=True,
proposal_weights_anneal_slope=10.0,
proposal_weights_anneal_max_num_iters=1000,
use_single_jitter=True,
predict_normals=False,
disable_scene_contraction=False,
use_gradient_scaling=False,
implementation='tcnn',
appearance_embed_dim=32,
average_init_density=0.01,
camera_optimizer=CameraOptimizerConfig(
_target=<class 'nerfstudio.cameras.camera_optimizers.CameraOptimizer'>,
mode='SO3xR3',
trans_l2_penalty=0.01,
rot_l2_penalty=0.001,
optimizer=None,
scheduler=None
)
)
),
optimizers={
'proposal_networks': {
'optimizer': AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.01,
eps=1e-15,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=0.0001,
warmup_steps=0,
max_steps=200000,
ramp='cosine'
)
},
'fields': {
'optimizer': AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.01,
eps=1e-15,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=0.0001,
warmup_steps=0,
max_steps=200000,
ramp='cosine'
)
},
'camera_opt': {
'optimizer': AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.001,
eps=1e-15,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=0.0001,
warmup_steps=0,
max_steps=5000,
ramp='cosine'
)
}
},
vis='viewer',
data=PosixPath('/workspace/processed/test'),
prompt=None,
relative_model_dir=PosixPath('nerfstudio_models'),
load_scheduler=True,
steps_per_save=2000,
steps_per_eval_batch=500,
steps_per_eval_image=500,
steps_per_eval_all_images=25000,
max_num_iterations=30000,
mixed_precision=True,
use_grad_scaler=False,
save_only_latest_checkpoint=True,
load_dir=None,
load_step=None,
load_config=None,
load_checkpoint=None,
log_gradients=False,
gradient_accumulation_steps={},
start_paused=False
)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Saving config to: outputs/test/nerfacto/2025-02-16_182154/config.yml experiment_config.py:136
Saving checkpoints to: outputs/test/nerfacto/2025-02-16_182154/nerfstudio_models trainer.py:142
Auto image downscale factor of 2 nerfstudio_dataparser.py:484
Started threads
Setting up evaluation dataset...
Caching all 17 images.
Loading data batch ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00
Traceback (most recent call last):
File "/usr/local/bin/ns-train", line 8, in <module>
sys.exit(entrypoint())
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 262, in entrypoint
main(
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 247, in main
launch(
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 189, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 99, in train_loop
trainer.setup()
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/engine/trainer.py", line 158, in setup
self.pipeline = self.config.pipeline.setup(
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/configs/base_config.py", line 53, in setup
return self._target(self, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/pipelines/base_pipeline.py", line 270, in __init__
self._model = config.model.setup(
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/configs/base_config.py", line 53, in setup
return self._target(self, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/models/base_model.py", line 85, in __init__
self.populate_modules() # populate the modules
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/models/nerfacto.py", line 252, in populate_modules
self.lpips = LearnedPerceptualImagePatchSimilarity(normalize=True)
File "/usr/local/lib/python3.10/dist-packages/torchmetrics/image/lpip.py", line 121, in __init__
self.net = _NoTrainLpips(net=net_type)
File "/usr/local/lib/python3.10/dist-packages/torchmetrics/functional/image/lpips.py", line 305, in __init__
self.net = net_type(pretrained=not self.pnet_rand, requires_grad=self.pnet_tune)
File "/usr/local/lib/python3.10/dist-packages/torchmetrics/functional/image/lpips.py", line 110, in __init__
alexnet_pretrained_features = _get_net("alexnet", pretrained)
File "/usr/local/lib/python3.10/dist-packages/torchmetrics/functional/image/lpips.py", line 57, in _get_net
pretrained_features = getattr(tv, net)(weights=getattr(tv, _weight_map[net]).IMAGENET1K_V1).features
File "/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py", line 142, in wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py", line 228, in inner_wrapper
return builder(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torchvision/models/alexnet.py", line 117, in alexnet
model.load_state_dict(weights.get_state_dict(progress=progress, check_hash=True))
File "/usr/local/lib/python3.10/dist-packages/torchvision/models/_api.py", line 90, in get_state_dict
return load_state_dict_from_url(self.url, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/hub.py", line 746, in load_state_dict_from_url
os.makedirs(model_dir)
File "/usr/lib/python3.10/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/lib/python3.10/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/lib/python3.10/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/.cache'
Why can ns-process-data create files and directories but ns-train can't? Especially since it worked in the past. The workspace is in my host's home directory and docker has to be run as root anyway.
@ahfabi Hi, I got the exact same issue and resolved it as follows:
1. When starting a container, mount /.cache/ and /.local to local host's directory.
docker run --gpus all \
-u $(id -u) \
-v /home/Codes/nerfstudio/data:/workspace/ \
-v /home/Codes/nerfstudio/.cache/:/.cache/ \ # -> This line is important
-v /home/Codes/nerfstudio/.local/:/.local/ \ # -> This line is also important
-p 7007:7007 \
--rm \
-it \
--shm-size=12gb \
nerfstudio
2. In host's terminal (not in docker container), change file mode as like:
chmod -R 777 /home/Codes/nerfstudio/.cache/
chmod -R 777 /home/Codes/nerfstudio/.local/
@mikigom Thanks for the suggestion! Since I am inexperienced with Docker, is it normal that you have to circumvent security with 777 or sudo? I was assuming an error in my command or some misconfiguration.
@ahfabi As far as I know, generally speaking, you do not have to give everything 777 permissions or run Docker as root in most cases. By default, many Docker images are designed to run as root inside the container. However, the tutorial suggests it used the option -u $(id -u) for running container, meaning the container is running with host userβs UID. This mismatch in UID/GID can cause permission issues when the container tries to write to directories (e.g. /.cache/, /.local/) that are only writable by root in the container filesystem.
In short, the more generous solution is to match the UID and GID on the host and in the container more precisely. For example, use -u 1000:1000 if your host user is uid=1000 and gid=1000.
Related Link: https://stackoverflow.com/questions/51596279/docker-permission-denied-in-container
So everybody should have encountered this issue - then why is the docker command from the tutorial not adjusted so it just works?
Tried it with -u $(id -u):$(id -g) but the PermissionError is still there and also groups: cannot find name for group ID
that are only writable by root in the container filesystem.
It also happens with the outputs directory - I assume that should be in the working directory and writable by the normal host user.
The weird thing is that I remember it working without these things, why was that?