ipex-llm
ipex-llm copied to clipboard
[Nano] Enable both ipex 1.11 and bf16 will raise AttributeError
Description
Run Trainer.fit with both ipex 1.11 and bf16 enabled will get the error below:
Environment
Python=3.7.13
torch=1.11.0
pytorch_lightning=1.6.4
ipex=1.11.0
bigdl-nano=2.1.0b20220801
In bigdl-nano 2.1.0b20220802
, it will raise another error as follows:
Traceback (most recent call last):
File "train.py", line 146, in <module>
main(args)
File "train.py", line 98, in main
val_dataloaders=val_loader)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 207, in advance
self.optimizer_idx,
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 378, in _optimizer_step
using_lbfgs=is_lbfgs,
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1595, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1646, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/opt/workspace/dax/anaconda3/envs/recsys/lib/python3.7/site-packages/intel_extension_for_pytorch/optim/_optimizer_utils.py", line 54, in master_param_non_fused_step
k.grad = value['bf16_param'].grad.detach().float()
AttributeError: 'NoneType' object has no attribute 'detach'
By the way, in my experiments, the test accuracy is the same whether enable_bf16
is True or False, which makes me wonder if this option takes effect.
enable_bf16 |
Fit Time | Test Accuracy | Test Loss |
---|---|---|---|
False |
4802.48s | 55.5208% | 1.1799 |
True |
4779.57s | 55.5208% | 1.1799 |
The model architecture is as follows:
| Name | Type | Params
--------------------------------------------------
0 | cross_encoder | XLMRobertaModel | 117 M
1 | fc_layer | Sequential | 1.5 K
--------------------------------------------------
117 M Trainable params
0 Non-trainable params
117 M Total params
470.569 Total estimated model params size (MB)
I tried enabling both ipex and bf16, but get different error message the input and weight need have same data type
.
Can you share the code so that i can reproduce this issue?
I tried enabling both ipex and bf16, but get different error message
the input and weight need have same data type
. Can you share the code so that i can reproduce this issue?
Sure. Please take a look at this repo.