threestudio
threestudio copied to clipboard
Multiple errors, WSL2 Ubuntu
Hello, all sorts of issues here culminating in a "FileNotFoundError: Text embedding file .threestudio_cache/text_embeddings/380af1c90b3b8ac914fde9dd32b144db.pt for model DeepFloyd/IF-I-XL-v1.0 and prompt [a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes] not found."
Fresh env and install, problem only exists when trying DeepFloyd method. Assuming it has something to do with the authentication but I really have no idea.
(threestudio) username@MYPC:~/foldername/mediagen/threestudio$ python launch.py --config configs/dreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"
Global seed set to 0
[INFO] ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
[INFO] Using 16bit Automatic Mixed Precision (AMP)
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] IPU available: False, using: 0 IPUs
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[INFO]
| Name | Type | Params
--------------------------------------------------------------
0 | geometry | ImplicitVolume | 12.6 M
1 | material | DiffuseWithPointLightMaterial | 0
2 | background | NeuralEnvironmentMapBackground | 448
3 | renderer | NeRFVolumeRenderer | 0
--------------------------------------------------------------
12.6 M Trainable params
0 Non-trainable params
12.6 M Total params
50.419 Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/dreamfusion-if/a_zoomed_out_DSLR_photo_of_a_baby_bunny_sitting_on_top_of_a_stack_of_pancakes@20230617-052807/save
[INFO] Using prompt [a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes] and negative prompt []
[INFO] Using view-dependent prompts [side]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, side view] [front]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, front view] [back]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, back view] [overhead]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, overhead view]
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/username/anaconda3/envs/threestudio did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/conda/lib'), PosixPath('/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/cv2/../../lib64')}
warn(msg)
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/cv2/../../lib64:/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/cv2/../../lib64::/opt/conda/lib/ did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA exception! Error code: no CUDA-capable device is detected
CUDA exception! Error code: initialization error
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
Process SpawnProcess-1:
Traceback (most recent call last):
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/username/foldername/mediagen/threestudio/threestudio/models/prompt_processors/deepfloyd_prompt_processor.py", line 61, in spawn_func
text_encoder = T5EncoderModel.from_pretrained(
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained
) = cls._load_pretrained_model(
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3228, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/modeling_utils.py", line 728, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 89, in set_module_quantized_tensor_to_device
new_value = bnb.nn.Int8Params(new_value, requires_grad=False, **kwargs).to(device)
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 294, in to
return self.cuda(device)
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 258, in cuda
CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1987, in double_quant
row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1876, in get_colrow_absmax
lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
func = self.__getitem__(name)
File "/home/username/anaconda3/envs/threestudio/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
╭───────────────────── Traceback (most recent call last) ─────────────────────╮
│ /home/username/foldername/mediagen/threestudio/launch.py:180 in │
│ <module> │
│ │
│ 177 │
│ 178 │
│ 179 if __name__ == "__main__": │
│ ❱ 180 │ main() │
│ 181 │
│ │
│ /home/username/foldername/mediagen/threestudio/launch.py:164 in main │
│ │
│ 161 │ │ system.set_resume_status(ckpt["epoch"], ckpt["global_step"]) │
│ 162 │ │
│ 163 │ if args.train: │
│ ❱ 164 │ │ trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume) │
│ 165 │ │ trainer.test(system, datamodule=dm) │
│ 166 │ elif args.validate: │
│ 167 │ │ # manually set epoch and global_step as they cannot be automa │
│ │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/trainer.py:531 in fit │
│ │
│ 528 │ │ """ │
│ 529 │ │ model = _maybe_unwrap_optimized(model) │
│ 530 │ │ self.strategy._lightning_module = model │
│ ❱ 531 │ │ call._call_and_handle_interrupt( │
│ 532 │ │ │ self, self._fit_impl, model, train_dataloaders, val_data │
│ 533 │ │ ) │
│ 534 │
│ │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/call.py:42 in _call_and_handle_interrupt │
│ │
│ 39 │ try: │
│ 40 │ │ if trainer.strategy.launcher is not None: │
│ 41 │ │ │ return trainer.strategy.launcher.launch(trainer_fn, *args │
│ ❱ 42 │ │ return trainer_fn(*args, **kwargs) │
│ 43 │ │
│ 44 │ except _TunerExitException: │
│ 45 │ │ _call_teardown_hook(trainer) │
│ │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/trainer.py:570 in _fit_impl │
│ │
│ 567 │ │ │ model_provided=True, │
│ 568 │ │ │ model_connected=self.lightning_module is not None, │
│ 569 │ │ ) │
│ ❱ 570 │ │ self._run(model, ckpt_path=ckpt_path) │
│ 571 │ │ │
│ 572 │ │ assert self.state.stopped │
│ 573 │ │ self.training = False │
│ │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/trainer.py:956 in _run │
│ │
│ 953 │ │ # hook │
│ 954 │ │ if self.state.fn == TrainerFn.FITTING: │
│ 955 │ │ │ call._call_callback_hooks(self, "on_fit_start") │
│ ❱ 956 │ │ │ call._call_lightning_module_hook(self, "on_fit_start") │
│ 957 │ │ │
│ 958 │ │ _log_hyperparams(self) │
│ 959 │
│ │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/call.py:140 in _call_lightning_module_hook │
│ │
│ 137 │ pl_module._current_fx_name = hook_name │
│ 138 │ │
│ 139 │ with trainer.profiler.profile(f"[LightningModule]{pl_module.__cla │
│ ❱ 140 │ │ output = fn(*args, **kwargs) │
│ 141 │ │
│ 142 │ # restore current_fx when nested context │
│ 143 │ pl_module._current_fx_name = prev_fx_name │
│ │
│ /home/username/foldername/mediagen/threestudio/threestudio/systems/dre │
│ amfusion.py:32 in on_fit_start │
│ │
│ 29 │ def on_fit_start(self) -> None: │
│ 30 │ │ super().on_fit_start() │
│ 31 │ │ # only used in training │
│ ❱ 32 │ │ self.prompt_processor = threestudio.find(self.cfg.prompt_proc │
│ 33 │ │ │ self.cfg.prompt_processor │
│ 34 │ │ ) │
│ 35 │ │ self.guidance = threestudio.find(self.cfg.guidance_type)(self │
│ │
│ /home/username/foldername/mediagen/threestudio/threestudio/utils/base. │
│ py:63 in __init__ │
│ │
│ 60 │ │ super().__init__() │
│ 61 │ │ self.cfg = parse_structured(self.Config, cfg) │
│ 62 │ │ self.device = get_device() │
│ ❱ 63 │ │ self.configure(*args, **kwargs) │
│ 64 │ │
│ 65 │ def configure(self, *args, **kwargs) -> None: │
│ 66 │ │ pass │
│ │
│ /home/username/foldername/mediagen/threestudio/threestudio/models/prom │
│ pt_processors/base.py:336 in configure │
│ │
│ 333 │ │ ] │
│ 334 │ │ │
│ 335 │ │ self.prepare_text_embeddings() │
│ ❱ 336 │ │ self.load_text_embeddings() │
│ 337 │ │
│ 338 │ @staticmethod │
│ 339 │ def spawn_func(pretrained_model_name_or_path, prompts, cache_dir) │
│ │
│ /home/username/foldername/mediagen/threestudio/threestudio/models/prom │
│ pt_processors/base.py:392 in load_text_embeddings │
│ │
│ 389 │ def load_text_embeddings(self): │
│ 390 │ │ # synchronize, to ensure the text embeddings have been comput │
│ 391 │ │ barrier() │
│ ❱ 392 │ │ self.text_embeddings = self.load_from_cache(self.prompt)[None │
│ 393 │ │ self.uncond_text_embeddings = self.load_from_cache(self.negat │
│ 394 │ │ │ None, ... │
│ 395 │ │ ] │
│ │
│ /home/username/foldername/mediagen/threestudio/threestudio/models/prom │
│ pt_processors/base.py:410 in load_from_cache │
│ │
│ 407 │ │ │ f"{hash_prompt(self.cfg.pretrained_model_name_or_path, pr │
│ 408 │ │ ) │
│ 409 │ │ if not os.path.exists(cache_path): │
│ ❱ 410 │ │ │ raise FileNotFoundError( │
│ 411 │ │ │ │ f"Text embedding file {cache_path} for model {self.cf │
│ 412 │ │ │ ) │
│ 413 │ │ return torch.load(cache_path, map_location=self.device) │
╰─────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: Text embedding file
.threestudio_cache/text_embeddings/380af1c90b3b8ac914fde9dd32b144db.pt for
model DeepFloyd/IF-I-XL-v1.0 and prompt [a zoomed out DSLR photo of a baby
bunny sitting on top of a stack of pancakes] not found.
Hi,
It appears that the prompt processor fails to generate a text embedding, resulting in the file not found error. I believe this error is caused by libbitsandbytes. You can refer to this issue https://github.com/TimDettmers/bitsandbytes/issues/156 to resolve it.
Thank you, I just seen another post talking about bitsandbytes issue and have had to solve it before for other programs so I have a good feeling that should work. I will update and close this after I have some sleep and get a chance to check it!
For anybody who needs it, I fixed the issue by following the advice given on the page @DSaurus linked to copy over libbitsandbytes_cuda.so version to libbitsandbytes_cpu.so Specifically as I'm running python3.10.9 and cuda 11.8, in your env directory: copy your lib\python3.10\site-packages\bitsandbytes\bitsandbytes_cuda118.so (or whatever your version of cuda is) over libbitsandbytes_cpu.so) as well as added export LD_LIBRARY_PATH="/usr/lib/wsl/lib:/usr/local/cuda/lib64" export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}} to .bashrc (being sure to source it after making the changes).
After that, everything seems to be running perfectly! :)
Glad to hear this and thanks for sharing the solution! @Ainaemaet
Hi, I made the code run via change the line 61 in deepfloyd_prompt_processor.py to
text_encoder = T5EncoderModel.from_pretrained(
pretrained_model_name_or_path,
subfolder="text_encoder",
torch_dtype=torch.float16, # suppress warning
load_in_8bit=True,
variant="8bit",
device_map="auto",
)
I simply let the load_in_8bit=False and it work. Does this operation hurt the performance or simply make it a bit slower?