LLaVA
LLaVA copied to clipboard
[Usage] llava-v1.6-mistral-7b will load in the demo, but llava-v1.6-34b will not.
Describe the issue
Issue: When starting a worker with 34B version of the 1.6 model, the worker will crash on the first inference. I've verified that the mistal-7b version does work and I can run the demo with the mistral version; this only happens on the 34B:
Command:
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ~/models/liuhaotian_llava-v1.6-34b/
Log:
[2024-01-31 22:40:12,336] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-01-31 22:40:12 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='/home/iceman/models/liuhaotian_llava-v1.6-34b/', model_base=None, model_name=None, device='cuda', multi_modal=False, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False) 2024-01-31 22:40:12 | INFO | model_worker | Loading the model liuhaotian_llava-v1.6-34b on worker b95d53 ... You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors. Loading checkpoint shards: 0%| | 0/15 [00:00<?, ?it/s]Loading checkpoint shards: 7%|███▊ | 1/15 [00:01<00:24, 1.78s/it]Loading checkpoint shards: 13%|███████▌ | 2/15 [00:03<00:23, 1.78s/it]Loading checkpoint shards: 20%|███████████▍ | 3/15 [00:05<00:21, 1.82s/it]Loading checkpoint shards: 27%|███████████████▏ | 4/15 [00:07<00:19, 1.81s/it]Loading checkpoint shards: 33%|███████████████████ | 5/15 [00:08<00:17, 1.80s/it]Loading checkpoint shards: 40%|██████████████████████▊ | 6/15 [00:10<00:16, 1.83s/it]Loading checkpoint shards: 47%|██████████████████████████▌ | 7/15 [00:12<00:14, 1.81s/it]Loading checkpoint shards: 53%|██████████████████████████████▍ | 8/15 [00:14<00:12, 1.80s/it]Loading checkpoint shards: 60%|██████████████████████████████████▏ | 9/15 [00:16<00:10, 1.82s/it]Loading checkpoint shards: 67%|█████████████████████████████████████▎ | 10/15 [00:18<00:09, 1.81s/it]Loading checkpoint shards: 73%|█████████████████████████████████████████ | 11/15 [00:19<00:07, 1.80s/it]Loading checkpoint shards: 80%|████████████████████████████████████████████▊ | 12/15 [00:21<00:05, 1.82s/it]Loading checkpoint shards: 87%|████████████████████████████████████████████████▌ | 13/15 [00:23<00:03, 1.81s/it]Loading checkpoint shards: 93%|████████████████████████████████████████████████████▎ | 14/15 [00:25<00:01, 1.80s/it]Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 15/15 [00:26<00:00, 1.49s/it]Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 15/15 [00:26<00:00, 1.74s/it]2024-01-31 22:40:42 | ERROR | stderr |
2024-01-31 22:40:43 | INFO | model_worker | Register to controller
2024-01-31 22:40:43 | ERROR | stderr | INFO: Started server process [7458] 2024-01-31 22:40:43 | ERROR | stderr | INFO: Waiting for application startup.
2024-01-31 22:40:43 | ERROR | stderr | INFO: Application startup complete.
2024-01-31 22:40:43 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:40000 (Press CTRL+C to quit) 2024-01-31 22:40:50 | INFO | stdout | INFO: 127.0.0.1:39398 - "POST /worker_get_status HTTP/1.1" 200 OK
2024-01-31 22:40:54 | INFO | model_worker | Send heart beat. Models: ['liuhaotian_llava-v1.6-34b']. Semaphore: Semaphore(value=4, locked=False). global_counter: 1 2024-01-31 22:40:54 | INFO | stdout | INFO: 127.0.0.1:39402 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-01-31 22:40:54 | ERROR | stderr | Exception in thread Thread-3 (generate):
2024-01-31 22:40:54 | ERROR | stderr | Traceback (most recent call last):
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
2024-01-31 22:40:54 | ERROR | stderr | self.run()
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/threading.py", line 953, in run
2024-01-31 22:40:54 | ERROR | stderr | self._target(*self._args, **self._kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-01-31 22:40:54 | ERROR | stderr | return func(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/src/LLaVA/llava/model/language_model/llava_llama.py", line 125, in generate
2024-01-31 22:40:54 | ERROR | stderr | ) = self.prepare_inputs_labels_for_multimodal(
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/src/LLaVA/llava/model/llava_arch.py", line 157, in prepare_inputs_labels_for_multimodal
2024-01-31 22:40:54 | ERROR | stderr | image_features = self.encode_images(concat_images)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/src/LLaVA/llava/model/llava_arch.py", line 141, in encode_images
2024-01-31 22:40:54 | ERROR | stderr | image_features = self.get_model().get_vision_tower()(images)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2024-01-31 22:40:54 | ERROR | stderr | return forward_call(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2024-01-31 22:40:54 | ERROR | stderr | output = old_forward(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-01-31 22:40:54 | ERROR | stderr | return func(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/src/LLaVA/llava/model/multimodal_encoder/clip_encoder.py", line 50, in forward
2024-01-31 22:40:54 | ERROR | stderr | image_forward_outs = self.vision_tower(images.to(device=self.device, dtype=self.dtype), output_hidden_states=True)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2024-01-31 22:40:54 | ERROR | stderr | return forward_call(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2024-01-31 22:40:54 | ERROR | stderr | output = old_forward(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 917, in forward
2024-01-31 22:40:54 | ERROR | stderr | return self.vision_model(
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2024-01-31 22:40:54 | ERROR | stderr | return forward_call(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2024-01-31 22:40:54 | ERROR | stderr | output = old_forward(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 841, in forward
2024-01-31 22:40:54 | ERROR | stderr | hidden_states = self.embeddings(pixel_values)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2024-01-31 22:40:54 | ERROR | stderr | return forward_call(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2024-01-31 22:40:54 | ERROR | stderr | output = old_forward(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 182, in forward
2024-01-31 22:40:54 | ERROR | stderr | patch_embeds = self.patch_embedding(pixel_values.to(dtype=target_dtype)) # shape = [*, width, grid, grid]
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2024-01-31 22:40:54 | ERROR | stderr | return forward_call(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2024-01-31 22:40:54 | ERROR | stderr | output = old_forward(*args, **kwargs)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
2024-01-31 22:40:54 | ERROR | stderr | return self._conv_forward(input, self.weight, self.bias)
2024-01-31 22:40:54 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
2024-01-31 22:40:54 | ERROR | stderr | return F.conv2d(input, weight, bias, self.stride,
2024-01-31 22:40:54 | ERROR | stderr | RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)
Given the error is complaining about tensors on two cuda devices on this machine (this is a 2x6000 workstation), I tried running with CUDA_VISIBLE_DEVICES=0 to have it only work with a single card, but that also doesn't work: the worker doesn't even successfully launch itself, hard crashing before it communicates with the gradio process:
Command:
CUDA_VISIBLE_DEVICES=0 python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ~/models/liuhaotian_llava-v1.6-34b/
Log:
[2024-01-31 22:46:17,323] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-01-31 22:46:17 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='/home/iceman/models/liuhaotian_llava-v1.6-34b/', model_base=None, model_name=None, device='cuda', multi_modal=False, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False)
2024-01-31 22:46:17 | INFO | model_worker | Loading the model liuhaotian_llava-v1.6-34b on worker f19483 ...
You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 0%| | 0/15 [00:00<?, ?it/s]Loading checkpoint shards: 7%|████████▎ | 1/15 [00:01<00:25, 1.86s/it]Loading checkpoint shards: 13%|████████████████▌ | 2/15 [00:03<00:24, 1.85s/it]Loading checkpoint shards: 20%|████████████████████████▊ | 3/15 [00:05<00:22, 1.90s/it]Loading checkpoint shards: 27%|█████████████████████████████████ | 4/15 [00:07<00:20, 1.89s/it]Loading checkpoint shards: 33%|█████████████████████████████████████████▎ | 5/15 [00:09<00:18, 1.88s/it]Loading checkpoint shards: 40%|█████████████████████████████████████████████████▌ | 6/15 [00:11<00:17, 1.90s/it]Loading checkpoint shards: 47%|█████████████████████████████████████████████████████████▊ | 7/15 [00:13<00:15, 1.89s/it]Loading checkpoint shards: 53%|██████████████████████████████████████████████████████████████████▏ | 8/15 [00:15<00:13, 1.88s/it]Loading checkpoint shards: 60%|██████████████████████████████████████████████████████████████████████████▍ | 9/15 [00:16<00:11, 1.90s/it]Loading checkpoint shards: 67%|██████████████████████████████████████████████████████████████████████████████████ | 10/15 [00:18<00:09, 1.88s/it]Loading checkpoint shards: 73%|██████████████████████████████████████████████████████████████████████████████████████████▏ | 11/15 [00:20<00:06, 1.72s/it]Loading checkpoint shards: 80%|██████████████████████████████████████████████████████████████████████████████████████████████████▍ | 12/15 [00:21<00:04, 1.66s/it]Loading checkpoint shards: 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 13/15 [00:23<00:03, 1.55s/it]Loading checkpoint shards: 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 14/15 [00:24<00:01, 1.48s/it]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:24<00:00, 1.20s/it]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:24<00:00, 1.66s/it]2024-01-31 22:46:46 | ERROR | stderr |
2024-01-31 22:46:46 | ERROR | stderr | Traceback (most recent call last):
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2024-01-31 22:46:46 | ERROR | stderr | return _run_code(code, main_globals, None,
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/miniconda3/envs/llava/lib/python3.10/runpy.py", line 86, in _run_code
2024-01-31 22:46:46 | ERROR | stderr | exec(code, run_globals)
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/src/LLaVA/llava/serve/model_worker.py", line 277, in <module>
2024-01-31 22:46:46 | ERROR | stderr | worker = ModelWorker(args.controller_address,
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/src/LLaVA/llava/serve/model_worker.py", line 65, in __init__
2024-01-31 22:46:46 | ERROR | stderr | self.tokenizer, self.model, self.image_processor, self.context_len = load_pretrained_model(
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/src/LLaVA/llava/model/builder.py", line 151, in load_pretrained_model
2024-01-31 22:46:46 | ERROR | stderr | vision_tower.to(device=device, dtype=torch.float16)
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
2024-01-31 22:46:46 | ERROR | stderr | return self._apply(convert)
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
2024-01-31 22:46:46 | ERROR | stderr | module._apply(fn)
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
2024-01-31 22:46:46 | ERROR | stderr | module._apply(fn)
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
2024-01-31 22:46:46 | ERROR | stderr | module._apply(fn)
2024-01-31 22:46:46 | ERROR | stderr | [Previous line repeated 1 more time]
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
2024-01-31 22:46:46 | ERROR | stderr | param_applied = fn(param)
2024-01-31 22:46:46 | ERROR | stderr | File "/home/iceman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
2024-01-31 22:46:46 | ERROR | stderr | return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2024-01-31 22:46:46 | ERROR | stderr | NotImplementedError: Cannot copy out of meta tensor; no data!
Current commit ran: c878cc3e66f75eb8227870be3d30268789913f82
Same problem
Same issue #1039
Bumping the vram to 80GB appears to have resolved it for me. Possibly a OOM error?
Bumping the vram to 80GB appears to have resolved it for me. Possibly a OOM error?
That would explain why when I restrict cuda visibility to a single 48gb card I get the error, but it doesn't solve the main problem: two 48gb cards should(tm) provide enough vram and the main bug here is it isn't splitting between the cards.
Same issue here. Were you able to fix that ? @levi @iceman-p
I suspect this is related to device="auto"
and low_cpu_mem_usage=True
@iceman-p Hi how did you load the 7b one? I am having trouble loading as i get https://github.com/haotian-liu/LLaVA/issues/1112