optimum
optimum copied to clipboard
Saving external data for large ONNX models
What does this PR do?
Fixes #254
The documentation is not available anymore as the PR was closed or merged.
With the latest commit, we're now able to do:
model = ORTModelForCausalLM.from_pretrained(
model_ckpt,
use_auth_token=True,
from_transformers=True,
cache_dir="model_cache",
onnx_cache_dir="./onnx_cache", # saves ONNX model with external data if large model to "./onnx_cache"
)
model = ORTModelForCausalLM.from_pretrained(
model_ckpt,
use_auth_token=True,
from_transformers=True,
cache_dir="model_cache" # like previous behaviour where `onnx_cache_dir`= `cache_dir`
)
And model.save_pretrained(save_path) would just copy files from onnx_cache_dir to the provided save_path
The following should be working now
# load small ONNX model
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-small-testing-onnx", use_auth_token=True)
# load large ONNX model (>2GB) by specifying folder containing model's weights
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-350m-onnx-folder", use_auth_token=True, onnx_folder="onnx")
Example of uploading a large ONNX model (>2GB) to the hub
from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import shutil
from huggingface_hub import HfApi
model_ckpt = "bigscience/bloom-350m"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)
tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForCausalLM.from_pretrained(
model_ckpt,
use_auth_token=True,
from_transformers=True,
onnx_cache_dir="./onnx_cache", # saves ONNX model to "./onnx_cache"
)
# save to local folder
model.save_pretrained(save_path / "onnx")
shutil.move(save_path / "onnx" / "config.json", save_path / "config.json")
tokenizer.save_pretrained(save_path)
# push to hub
repo_id = "nouamanetazi/bloom-350m-onnx-folder-test"
api = HfApi()
api.create_repo(repo_id=repo_id, exist_ok=True)
api.upload_folder(folder_path=save_path, repo_id=repo_id, path_in_repo=".", repo_type="model")
We can now save/load large ORTModelForSeq2SeqLM
from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM
model_ckpt = "facebook/mbart-large-en-ro"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)
tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForSeq2SeqLM.from_pretrained(
model_ckpt,
use_auth_token=True,
from_transformers=True,
)
# # save to local folder
model.save_pretrained(save_path / "onnx")
tokenizer.save_pretrained(save_path)
Awesome! I think it would be great to add tests, essentially that saving / reloading works well, in the encoder-only/encoder-decoder cases.
For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit. Will try to add tests once I have time @fxmarty
For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit. Will try to add tests once I have time @fxmarty
You could use the following api to convert a small model to external data. converting-an-onnx-model-to-external-data
The size threshold has to be low so that it can create the files.
Hi @NouamaneTazi, thanks for the PR, it would require a small change to handle one more use-case for the modeling_seq2seq and modeling_decoder.
Taking an example of Seq2Seq class, it generates 3 different models, encoder.onnx, decoder.onnx, decoder_with_past.onnx. These are generated in the same folder right now. In case there are external files, there is a chance of an overwrite if they have the same names. See: 26983
Possible fix is to save them in folders like, encoder/encoder.onnx, decoder/decoder.onnx etc. The same change would be required in the exporters.
We should probably do the same in exporters actually
I'm trying to write tests for saving/loading with external data, but it's not as trivial as it seems. Trying to apply your suggestion @mht-sharma by using :
model = ORTModelForSeq2SeqLM.from_pretrained(self.ONNX_SEQ2SEQ_MODEL_ID, use_cache=True)
model.save_pretrained(tmpdirname)
# load model proto
onnx_model = onnx.load(str(model.model_path))
# save external data
os.makedirs(str(model.model_path.parent / "external_data"), exist_ok=True)
onnx.save_model(onnx_model, str(model.model_path.parent / "external_data" / "model.onnx"), save_as_external_data=True, all_tensors_to_one_file=False, size_threshold=8, convert_attribute=False)
# need to do this for encoder/decoder/decoder_with_past
But again this wouldn't test our model.save_pretrained API at all. Because in our API the saving to onnx is done using torch.onnx.export here which doesn't accept an argument to specify external data format.
I'm open for suggestions, or else we can merge this for now
@NouamaneTazi Why not use actual >2GB models, initialized and saved random from transformers (so no download time)? So no need of custom logic.
@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example ORTModelForSequenceClassification from a BertForSequenceClassification instance?
@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example
ORTModelForSequenceClassificationfrom aBertForSequenceClassificationinstance?
You can do save_pretrained() on the PretrainedModel, and then from_pretrained then from a local folder using ORTModel.
@NouamaneTazi from pathlib import Path from transformers import AutoTokenizer from optimum.onnxruntime import ORTModelForSeq2SeqLM
model_ckpt = "facebook/mbart-large-en-ro" save_path = Path(f"saved_model/{model_ckpt}") save_path.mkdir(parents=True, exist_ok=True)
tokenizer = AutoTokenizer.from_pretrained(model_ckpt) model = ORTModelForSeq2SeqLM.from_pretrained( model_ckpt, from_transformers=True, ) model.save_pretrained(save_path / "onnx") tokenizer.save_pretrained(save_path)
Log:
<Trial 2015437 worker_0> genius $ python3 /opt/tiger/genius/tensorrt/load.py
2022-12-15 09:13:01.539465: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:239: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:912: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:100: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min))
Traceback (most recent call last):
File "/opt/tiger/genius/tensorrt/load.py", line 55, in
** Environment optimum-1.5.1
I have similar issue mentioned here: https://github.com/huggingface/optimum/issues/589#issuecomment-1352465502
Migrated this PR to https://github.com/huggingface/optimum/pull/586