h2o-llmstudio icon indicating copy to clipboard operation
h2o-llmstudio copied to clipboard

[FEATURE] Select GPU for Inference / Graceful OOM Error

Open RobMulla opened this issue 1 year ago • 0 comments

🚀 Feature

I trained an experiment on a specific GPU of a multi-gpu machine.

When selecting the chat tab within the experiment the model attempts to load on the first GPU, and because this GPU is in use it results in an OOM error. The tab shows an "Oops! Something went wrong" message.

Suggest that:

  • Users can select GPUs globally to be used in LLM Studio for both training and inference, or
  • chat tab should allow to selecting the GPU to use for inference, or
  • OOM errors are dealt with more gracefully.

Example OOM error on Chat tab

q.app
script_sources: ['/_f/8ec09477-62cc-477b-bc34-2a6318389fa8/tmpcis0c60y.min.js']
initialized: True
wave_utils_stack_trace_str: ### stacktrace
Traceback (most recent call last):

  File "/workspace/./app_utils/handlers.py", line 314, in handle
    await experiment_display(q)

  File "/workspace/./app_utils/sections/experiment.py", line 950, in experiment_display
    await chat_tab(q)

  File "/workspace/./app_utils/sections/experiment.py", line 1168, in chat_tab
    cfg, model, tokenizer = load_cfg_model_tokenizer(

  File "/workspace/./app_utils/sections/experiment.py", line 1901, in load_cfg_model_tokenizer
    model = cfg.architecture.model_class(cfg)

  File "/workspace/./llm_studio/src/models/text_causal_language_modeling_model.py", line 97, in __init__
    self.backbone = create_nlp_backbone(

  File "/workspace/./llm_studio/src/utils/modeling_utils.py", line 501, in create_nlp_backbone
    backbone = model_class.from_config(config, **kwargs)

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 425, in from_config
    return model_class._from_config(config, **kwargs)

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1143, in _from_config
    model = cls(config, **kwargs)

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 604, in __init__
    self.gpt_neox = GPTNeoXModel(config)

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 427, in __init__
    self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 427, in <listcomp>
    self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 318, in __init__
    self.attention = GPTNeoXAttention(config)

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 109, in __init__
    self.query_key_value = nn.Linear(config.hidden_size, 3 * config.hidden_size)

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))

  File "/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/utils/_device.py", line 62, in __torch_function__
    return func(*args, **kwargs)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 47.54 GiB total capacity; 2.36 GiB already allocated; 3.88 MiB free; 2.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

q.user
q.client
app_db: <app_utils.db.Database object at 0x7fcc81b74430>
client_initialized: True
mode_curr: error
theme_dark: True
default_aws_bucket_name: bucket_name
default_kaggle_username: 
set_max_epochs: 50
set_max_batch_size: 256
set_max_gradient_clip: 10
default_number_of_workers: 8
default_logger: None
default_neptune_project: 
default_openai_azure: False
default_openai_api_base: https://example-endpoint.openai.azure.com
default_openai_api_deployment_id: deployment-name
default_openai_api_version: 2023-05-15
default_gpt_eval_max: 100
delete_dialogs: True
chart_plot_max_points: 1000
init_interface: True
notification_bar: None
nav/active: experiment/list
experiment/list/mode: train
dataset/list/df_datasets:    id                         name  ... validation rows      labels
3   4  ign_500k_combined_sample10k  ...            None      output
2   3            ign_500k_combined  ...            None      output
1   2                   lexfridGPT  ...            None  transcript
0   1                        oasst  ...            None      output

[4 rows x 10 columns]
experiment/list/df_experiments:     id                name   mode  ... progress    status               info
15  16    cobalt-dragonfly  train  ...      1.0  finished  Runtime: 00:40:14
14  15     quixotic-numbat  train  ...     0.32   stopped                   
13  14        opal-caribou  train  ...      0.0    failed           See logs
12  13  therapeutic-toucan  train  ...      0.0   stopped                   
11  12           sandy-ant  train  ...     0.02   stopped                   
10  11    imperial-warthog  train  ...      0.0   stopped                   
9   10     ambitious-akita  train  ...      0.0    failed           See logs
8    9     beneficial-ibis  train  ...     0.05    failed                   
7    8    robust-tarantula  train  ...      0.0   stopped                   
6    7         beryl-shark  train  ...      0.0    failed           See logs
5    6   unyielding-beagle  train  ...      0.0    failed           See logs
4    5     vagabond-coyote  train  ...      0.0    failed                   
3    4  tourmaline-lobster  train  ...      0.0    failed           See logs
2    3    hopeful-porpoise  train  ...      0.0    failed           See logs
1    2    fantastic-cougar  train  ...      1.0  finished  Runtime: 00:01:26
0    1          sly-sawfly  train  ...      1.0  finished  Runtime: 00:00:59

[16 rows x 16 columns]
expander: True
dataset/list: False
dataset/list/table: []
experiment/list: True
experiment/list/table: ['0']
experiment/display/id: 0
experiment/display/logs_path: None
experiment/display/preds_path: None
experiment/display/tab: experiment/display/chat
experiment/display/experiment_id: 16
experiment/display/experiment: <app_utils.db.Experiment object at 0x7fcc82702440>
experiment/display/experiment_path: output/user/cobalt-dragonfly/
experiment/display/refresh: False
experiment/display/download_logs: False
experiment/display/download_predictions: False
experiment/display/download_model: False
experiment/display/push_to_huggingface: False
experiment/list/current: False
experiment/display/chat: True
chat_msg_num: 0
experiment/display/chat/messages: []
chat_settings: True
home: False
report_error: True
experiment/list/refresh: False
experiment/list/compare: False
experiment/list/stop: False
experiment/list/delete: False
experiment/list/new: False
experiment/list/rename: False
experiment/list/stop/table: False
experiment/list/delete/table/dialog: False
q.events
q.args
home: False
report_error: True
stacktrace
Traceback (most recent call last):

File “/workspace/./app_utils/handlers.py”, line 314, in handle await experiment_display(q)

File “/workspace/./app_utils/sections/experiment.py”, line 950, in experiment_display await chat_tab(q)

File “/workspace/./app_utils/sections/experiment.py”, line 1168, in chat_tab cfg, model, tokenizer = load_cfg_model_tokenizer(

File “/workspace/./app_utils/sections/experiment.py”, line 1901, in load_cfg_model_tokenizer model = cfg.architecture.model_class(cfg)

File “/workspace/./llm_studio/src/models/text_causal_language_modeling_model.py”, line 97, in init self.backbone = create_nlp_backbone(

File “/workspace/./llm_studio/src/utils/modeling_utils.py”, line 501, in create_nlp_backbone backbone = model_class.from_config(config, **kwargs)

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 425, in from_config return model_class._from_config(config, **kwargs)

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 1143, in _from_config model = cls(config, **kwargs)

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 604, in init self.gpt_neox = GPTNeoXModel(config)

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 427, in init self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 427, in self.layers = nn.ModuleList([GPTNeoXLayer(config) for _ in range(config.num_hidden_layers)])

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 318, in init self.attention = GPTNeoXAttention(config)

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py”, line 109, in init self.query_key_value = nn.Linear(config.hidden_size, 3 * config.hidden_size)

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/nn/modules/linear.py”, line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))

File “/root/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/torch/utils/_device.py”, line 62, in torch_function return func(*args, **kwargs)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 47.54 GiB total capacity; 2.36 GiB already allocated; 3.88 MiB free; 2.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

RobMulla avatar Jun 21 '23 21:06 RobMulla