h2o-llmstudio
h2o-llmstudio copied to clipboard
[FEATURE] Select GPU for Inference / Graceful OOM Error
🚀 Feature
I trained an experiment on a specific GPU of a multi-gpu machine.
When selecting the chat
tab within the experiment the model attempts to load on the first GPU, and because this GPU is in use it results in an OOM error. The tab shows an "Oops! Something went wrong" message.
Suggest that:
- Users can select GPUs globally to be used in LLM Studio for both training and inference, or
-
chat
tab should allow to selecting the GPU to use for inference, or - OOM errors are dealt with more gracefully.
Example OOM error on Chat
tab
q.app
script_sources: ['/_f/8ec09477-62cc-477b-bc34-2a6318389fa8/tmpcis0c60y.min.js']
initialized: True
wave_utils_stack_trace_str: ### stacktrace
Traceback (most recent call last):
File "/workspace/./app_utils/handlers.py", line 314, in handle
await experiment_display(q)
File "/workspace/./app_utils/sections/experiment.py", line 950, in experiment_display
await chat_tab(q)
File "/workspace/./app_utils/sections/experiment.py", line 1168, in chat_tab
cfg, model, tokenizer = load_cfg_model_tokenizer(
File "/workspace/./app_utils/sections/experiment.py", line 1901, in load_cfg_model_tokenizer
model = cfg.architecture.model_class(cfg)
File "/workspace/./llm_studio/src/models/text_causal_language_modeling_model.py", line 97, in __init__
self.backbone = create_nlp_backbone(
File "/workspace/./llm_studio/src/utils/modeling_utils.py", line 501, in create_nlp_backbone
backbone = model_class.from_config(config, **kwargs)
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 425, in from_config
return model_class._from_config(config, **kwargs)
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1143, in _from_config
model = cls(config, **kwargs)
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 604, in __init__
self.gpt_neox = GPTNeoXModel(config)
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 427, in __init__
self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 427, in <listcomp>
self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 318, in __init__
self.attention = GPTNeoXAttention(config)
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 109, in __init__
self.query_key_value = nn.Linear(config.hidden_size, 3 * config.hidden_size)
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 96, in __init__
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/utils/_device.py", line 62, in __torch_function__
return func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 47.54 GiB total capacity; 2.36 GiB already allocated; 3.88 MiB free; 2.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
q.user
q.client
app_db: <app_utils.db.Database object at 0x7fcc81b74430>
client_initialized: True
mode_curr: error
theme_dark: True
default_aws_bucket_name: bucket_name
default_kaggle_username:
set_max_epochs: 50
set_max_batch_size: 256
set_max_gradient_clip: 10
default_number_of_workers: 8
default_logger: None
default_neptune_project:
default_openai_azure: False
default_openai_api_base: https://example-endpoint.openai.azure.com
default_openai_api_deployment_id: deployment-name
default_openai_api_version: 2023-05-15
default_gpt_eval_max: 100
delete_dialogs: True
chart_plot_max_points: 1000
init_interface: True
notification_bar: None
nav/active: experiment/list
experiment/list/mode: train
dataset/list/df_datasets: id name ... validation rows labels
3 4 ign_500k_combined_sample10k ... None output
2 3 ign_500k_combined ... None output
1 2 lexfridGPT ... None transcript
0 1 oasst ... None output
[4 rows x 10 columns]
experiment/list/df_experiments: id name mode ... progress status info
15 16 cobalt-dragonfly train ... 1.0 finished Runtime: 00:40:14
14 15 quixotic-numbat train ... 0.32 stopped
13 14 opal-caribou train ... 0.0 failed See logs
12 13 therapeutic-toucan train ... 0.0 stopped
11 12 sandy-ant train ... 0.02 stopped
10 11 imperial-warthog train ... 0.0 stopped
9 10 ambitious-akita train ... 0.0 failed See logs
8 9 beneficial-ibis train ... 0.05 failed
7 8 robust-tarantula train ... 0.0 stopped
6 7 beryl-shark train ... 0.0 failed See logs
5 6 unyielding-beagle train ... 0.0 failed See logs
4 5 vagabond-coyote train ... 0.0 failed
3 4 tourmaline-lobster train ... 0.0 failed See logs
2 3 hopeful-porpoise train ... 0.0 failed See logs
1 2 fantastic-cougar train ... 1.0 finished Runtime: 00:01:26
0 1 sly-sawfly train ... 1.0 finished Runtime: 00:00:59
[16 rows x 16 columns]
expander: True
dataset/list: False
dataset/list/table: []
experiment/list: True
experiment/list/table: ['0']
experiment/display/id: 0
experiment/display/logs_path: None
experiment/display/preds_path: None
experiment/display/tab: experiment/display/chat
experiment/display/experiment_id: 16
experiment/display/experiment: <app_utils.db.Experiment object at 0x7fcc82702440>
experiment/display/experiment_path: output/user/cobalt-dragonfly/
experiment/display/refresh: False
experiment/display/download_logs: False
experiment/display/download_predictions: False
experiment/display/download_model: False
experiment/display/push_to_huggingface: False
experiment/list/current: False
experiment/display/chat: True
chat_msg_num: 0
experiment/display/chat/messages: []
chat_settings: True
home: False
report_error: True
experiment/list/refresh: False
experiment/list/compare: False
experiment/list/stop: False
experiment/list/delete: False
experiment/list/new: False
experiment/list/rename: False
experiment/list/stop/table: False
experiment/list/delete/table/dialog: False
q.events
q.args
home: False
report_error: True
stacktrace
Traceback (most recent call last):
File “/workspace/./app_utils/handlers.py”, line 314, in handle await experiment_display(q)
File “/workspace/./app_utils/sections/experiment.py”, line 950, in experiment_display await chat_tab(q)
File “/workspace/./app_utils/sections/experiment.py”, line 1168, in chat_tab cfg, model, tokenizer = load_cfg_model_tokenizer(
File “/workspace/./app_utils/sections/experiment.py”, line 1901, in load_cfg_model_tokenizer model = cfg.architecture.model_class(cfg)
File “/workspace/./llm_studio/src/models/text_causal_language_modeling_model.py”, line 97, in init self.backbone = create_nlp_backbone(
File “/workspace/./llm_studio/src/utils/modeling_utils.py”, line 501, in create_nlp_backbone backbone = model_class.from_config(config, **kwargs)
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 425, in from_config return model_class._from_config(config, **kwargs)
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 1143, in _from_config model = cls(config, **kwargs)
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 604, in init self.gpt_neox = GPTNeoXModel(config)
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 427, in init self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 427, in self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 318, in init self.attention = GPTNeoXAttention(config)
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 109, in init self.query_key_value = nn.Linear(config.hidden_size, 3 * config.hidden_size)
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/nn/modules/linear.py”, line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/utils/_device.py”, line 62, in torch_function return func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 47.54 GiB total capacity; 2.36 GiB already allocated; 3.88 MiB free; 2.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF