optimum-intel
optimum-intel copied to clipboard
Running run_clm.py results GPU OOM.
Try to run nueral_compressor/language_modeling, as follows. it just same as on read.me. I have 24G GPU, but cause GPU OOM. This model is only 125M, is it normal? How much GPU ram do I need?
python run_clm.py \
--model_name_or_path EleutherAI/gpt-neo-125M \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--apply_quantization \
--quantization_approach aware_training \
--apply_pruning \
--target_sparsity 0.02 \
--num_train_epochs 4 \
--max_train_samples 100 \
--do_train \
--do_eval \
--verify_loading \
--output_dir /tmp/clm_output
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
0%| | 0/52 [00:00<?, ?it/s]2023-04-10 13:44:00 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/observer.py:214: UserWarning: Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch.
warnings.warn(
2023-04-10 13:44:02 [INFO] current target ratio is 0.0
2023-04-10 13:44:03 [INFO] current sparsity ratio is 0.0
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py:309: UserWarning: _aminmax is deprecated as of PyTorch 1.11 and will be removed in a future release. Use aminmax instead. This warning will only appear once per process. (Triggered internally at ../aten/src/ATen/native/ReduceAllOps.cpp:45.)
return torch.fused_moving_avg_obs_fake_quant(
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py:309: UserWarning: _aminmax is deprecated as of PyTorch 1.11 and will be removed in a future release. Use aminmax instead. This warning will only appear once per process. (Triggered internally at ../aten/src/ATen/native/TensorCompare.cpp:568.)
return torch.fused_moving_avg_obs_fake_quant(
Traceback (most recent call last):
File "/home/chang/AI/llm/optimum-intel/examples/neural_compressor/language-modeling/run_clm.py", line 732, in <module>
main()
File "/home/chang/AI/llm/optimum-intel/examples/neural_compressor/language-modeling/run_clm.py", line 654, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/home/chang/AI/llm/optimum-intel/optimum/intel/neural_compressor/trainer.py", line 411, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/trainer.py", line 2645, in training_step
loss = self.compute_loss(model, inputs)
File "/home/chang/AI/llm/optimum-intel/optimum/intel/neural_compressor/trainer.py", line 699, in compute_loss
outputs = model(**inputs)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/models/gpt_neo/modeling_gpt_neo.py", line 756, in forward
lm_logits = self.lm_head(hidden_states)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 658, in call_wrapped
return self._wrapped_call(self, *args, **kwargs)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 277, in __call__
raise e
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 267, in __call__
return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "<eval_with_key>.439", line 7, in forward
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1215, in _call_impl
hook_result = hook(self, input, result)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/neural_compressor/adaptor/torch_utils/util.py", line 84, in output_scale_hook
module.output_observer(output)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py", line 309, in forward
return torch.fused_moving_avg_obs_fake_quant(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.54 GiB (GPU 0; 23.68 GiB total capacity; 20.09 GiB already allocated; 1.05 GiB free; 20.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF