optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Saving external data for large ONNX models

Open NouamaneTazi opened this issue 3 years ago • 3 comments

What does this PR do?

Fixes #254

NouamaneTazi avatar Jul 01 '22 17:07 NouamaneTazi

The documentation is not available anymore as the PR was closed or merged.

With the latest commit, we're now able to do:

model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    cache_dir="model_cache",
    onnx_cache_dir="./onnx_cache",  # saves ONNX model with external data if large model to "./onnx_cache"
)

model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    cache_dir="model_cache"  # like previous behaviour where `onnx_cache_dir`= `cache_dir`
)

And model.save_pretrained(save_path) would just copy files from onnx_cache_dir to the provided save_path

NouamaneTazi avatar Jul 02 '22 15:07 NouamaneTazi

The following should be working now

# load small ONNX model
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-small-testing-onnx", use_auth_token=True)
# load large ONNX model (>2GB) by specifying folder containing model's weights
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-350m-onnx-folder", use_auth_token=True, onnx_folder="onnx")

Example of uploading a large ONNX model (>2GB) to the hub

from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import shutil
from huggingface_hub import HfApi

model_ckpt = "bigscience/bloom-350m"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    onnx_cache_dir="./onnx_cache",  # saves ONNX model to "./onnx_cache"
)

# save to local folder
model.save_pretrained(save_path / "onnx")
shutil.move(save_path / "onnx" / "config.json", save_path / "config.json")
tokenizer.save_pretrained(save_path)

# push to hub
repo_id = "nouamanetazi/bloom-350m-onnx-folder-test"
api = HfApi()
api.create_repo(repo_id=repo_id, exist_ok=True)
api.upload_folder(folder_path=save_path, repo_id=repo_id, path_in_repo=".", repo_type="model")

NouamaneTazi avatar Jul 02 '22 18:07 NouamaneTazi

We can now save/load large ORTModelForSeq2SeqLM

from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_ckpt = "facebook/mbart-large-en-ro"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForSeq2SeqLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
)

# # save to local folder
model.save_pretrained(save_path / "onnx")
tokenizer.save_pretrained(save_path)

NouamaneTazi avatar Dec 07 '22 13:12 NouamaneTazi

Awesome! I think it would be great to add tests, essentially that saving / reloading works well, in the encoder-only/encoder-decoder cases.

fxmarty avatar Dec 07 '22 14:12 fxmarty

For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit. Will try to add tests once I have time @fxmarty

NouamaneTazi avatar Dec 07 '22 15:12 NouamaneTazi

For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit. Will try to add tests once I have time @fxmarty

You could use the following api to convert a small model to external data. converting-an-onnx-model-to-external-data

The size threshold has to be low so that it can create the files.

mht-sharma avatar Dec 08 '22 07:12 mht-sharma

Hi @NouamaneTazi, thanks for the PR, it would require a small change to handle one more use-case for the modeling_seq2seq and modeling_decoder.

Taking an example of Seq2Seq class, it generates 3 different models, encoder.onnx, decoder.onnx, decoder_with_past.onnx. These are generated in the same folder right now. In case there are external files, there is a chance of an overwrite if they have the same names. See: 26983

Possible fix is to save them in folders like, encoder/encoder.onnx, decoder/decoder.onnx etc. The same change would be required in the exporters.

mht-sharma avatar Dec 08 '22 07:12 mht-sharma

We should probably do the same in exporters actually

fxmarty avatar Dec 08 '22 09:12 fxmarty

I'm trying to write tests for saving/loading with external data, but it's not as trivial as it seems. Trying to apply your suggestion @mht-sharma by using :


            model = ORTModelForSeq2SeqLM.from_pretrained(self.ONNX_SEQ2SEQ_MODEL_ID, use_cache=True)
            model.save_pretrained(tmpdirname)

            # load model proto
            onnx_model = onnx.load(str(model.model_path)) 

            # save external data
            os.makedirs(str(model.model_path.parent / "external_data"), exist_ok=True)
            onnx.save_model(onnx_model, str(model.model_path.parent / "external_data" / "model.onnx"), save_as_external_data=True, all_tensors_to_one_file=False, size_threshold=8, convert_attribute=False)

            # need to do this for encoder/decoder/decoder_with_past

But again this wouldn't test our model.save_pretrained API at all. Because in our API the saving to onnx is done using torch.onnx.export here which doesn't accept an argument to specify external data format.

I'm open for suggestions, or else we can merge this for now

NouamaneTazi avatar Dec 12 '22 15:12 NouamaneTazi

@NouamaneTazi Why not use actual >2GB models, initialized and saved random from transformers (so no download time)? So no need of custom logic.

fxmarty avatar Dec 12 '22 16:12 fxmarty

@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example ORTModelForSequenceClassification from a BertForSequenceClassification instance?

NouamaneTazi avatar Dec 12 '22 23:12 NouamaneTazi

@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example ORTModelForSequenceClassification from a BertForSequenceClassification instance?

You can do save_pretrained() on the PretrainedModel, and then from_pretrained then from a local folder using ORTModel.

fxmarty avatar Dec 13 '22 08:12 fxmarty

@NouamaneTazi from pathlib import Path from transformers import AutoTokenizer from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_ckpt = "facebook/mbart-large-en-ro" save_path = Path(f"saved_model/{model_ckpt}") save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt) model = ORTModelForSeq2SeqLM.from_pretrained( model_ckpt, from_transformers=True, ) model.save_pretrained(save_path / "onnx") tokenizer.save_pretrained(save_path)

Log: <Trial 2015437 worker_0> genius $ python3 /opt/tiger/genius/tensorrt/load.py 2022-12-15 09:13:01.539465: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. /home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:239: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len): /home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attention_mask.size() != (bsz, 1, tgt_len, src_len): /home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): /home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:912: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1: /home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:100: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min)) Traceback (most recent call last): File "/opt/tiger/genius/tensorrt/load.py", line 55, in from_transformers=True, File "/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_ort.py", line 280, in from_pretrained **kwargs, File "/usr/local/lib/python3.7/dist-packages/optimum/modeling_base.py", line 263, in from_pretrained **model_kwargs, File "/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_seq2seq.py", line 597, in _from_transformers output=save_dir.joinpath(ONNX_DECODER_NAME), File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 353, in export return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device) File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 204, in export_pytorch raise err File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 189, in export_pytorch opset_version=opset, File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/init.py", line 280, in export custom_opsets, enable_onnx_checker, use_external_data_format) File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export use_external_data_format=use_external_data_format) File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/utils.py", line 706, in _export val_add_node_names, val_use_external_data_format, model_file_location) RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

** Environment optimum-1.5.1

PoodleWang avatar Dec 15 '22 01:12 PoodleWang

I have similar issue mentioned here: https://github.com/huggingface/optimum/issues/589#issuecomment-1352465502

PoodleWang avatar Dec 15 '22 03:12 PoodleWang

Migrated this PR to https://github.com/huggingface/optimum/pull/586

NouamaneTazi avatar Dec 16 '22 16:12 NouamaneTazi