optimum-intel Running run_clm.py results GPU OOM.

Running run_clm.py results GPU OOM.

Open lcw99 opened this issue 1 year ago • 0 comments

Try to run nueral_compressor/language_modeling, as follows. it just same as on read.me. I have 24G GPU, but cause GPU OOM. This model is only 125M, is it normal? How much GPU ram do I need?

python run_clm.py \
    --model_name_or_path EleutherAI/gpt-neo-125M \
    --dataset_name wikitext \
    --dataset_config_name wikitext-2-raw-v1 \
    --apply_quantization \
    --quantization_approach aware_training \
    --apply_pruning \
    --target_sparsity 0.02 \
    --num_train_epochs 4 \
    --max_train_samples 100 \
    --do_train \
    --do_eval \
    --verify_loading \
    --output_dir /tmp/clm_output

/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                  | 0/52 [00:00<?, ?it/s]2023-04-10 13:44:00 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/observer.py:214: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  warnings.warn(
2023-04-10 13:44:02 [INFO] current target ratio is 0.0
2023-04-10 13:44:03 [INFO] current sparsity ratio is 0.0
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py:309: UserWarning: _aminmax is deprecated as of PyTorch 1.11 and will be removed in a future release. Use aminmax instead. This warning will only appear once per process. (Triggered internally at ../aten/src/ATen/native/ReduceAllOps.cpp:45.)
  return torch.fused_moving_avg_obs_fake_quant(
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py:309: UserWarning: _aminmax is deprecated as of PyTorch 1.11 and will be removed in a future release. Use aminmax instead. This warning will only appear once per process. (Triggered internally at ../aten/src/ATen/native/TensorCompare.cpp:568.)
  return torch.fused_moving_avg_obs_fake_quant(
Traceback (most recent call last):
  File "/home/chang/AI/llm/optimum-intel/examples/neural_compressor/language-modeling/run_clm.py", line 732, in <module>
    main()
  File "/home/chang/AI/llm/optimum-intel/examples/neural_compressor/language-modeling/run_clm.py", line 654, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/trainer.py", line 1633, in train
    return inner_training_loop(
  File "/home/chang/AI/llm/optimum-intel/optimum/intel/neural_compressor/trainer.py", line 411, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/trainer.py", line 2645, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/chang/AI/llm/optimum-intel/optimum/intel/neural_compressor/trainer.py", line 699, in compute_loss
    outputs = model(**inputs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/models/gpt_neo/modeling_gpt_neo.py", line 756, in forward
    lm_logits = self.lm_head(hidden_states)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 658, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 277, in __call__
    raise e
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 267, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "<eval_with_key>.439", line 7, in forward
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1215, in _call_impl
    hook_result = hook(self, input, result)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/neural_compressor/adaptor/torch_utils/util.py", line 84, in output_scale_hook
    module.output_observer(output)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py", line 309, in forward
    return torch.fused_moving_avg_obs_fake_quant(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.54 GiB (GPU 0; 23.68 GiB total capacity; 20.09 GiB already allocated; 1.05 GiB free; 20.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Apr 10 '23 04:04 lcw99

optimum-intel optimum-intel copied to clipboard

Running run_clm.py results GPU OOM.

optimum-intel
optimum-intel copied to clipboard