exporters
exporters copied to clipboard
Export & use T5-Base model for summarization
Hey guys,
I'm pretty new to CoreML conversion stuff and took the naive approach of converting a T5-Base model to CoreML (I want to use it to generate summarisations). As layed out in the README I created an encoder and a decoder model, which worked without a problem:
(base) me@me-MacBook-Pro ~/Development/projects/exporters$ python -m exporters.coreml --model=t5-small --feature=text2text-generation exported ✭main
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.0.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Converting encoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 755/756 [00:00<00:00, 2482.08 ops/s]
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<00:00, 73.01 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 27.71 passes/s]
Validating Core ML model...
-[✓] Core ML model output names match reference model ({'last_hidden_state'})
- Validating Core ML model output "last_hidden_state":
-[✓] (1, 128, 768) matches (1, 128, 768)
-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/encoder_Model.mlpackage
Converting decoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/transformers/modeling_utils.py:828: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_mask.shape[1] < attention_mask.shape[1]:
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1260/1262 [00:00<00:00, 2404.55 ops/s]
Running MIL Common passes: 5%|████████▊ | 2/39 [00:00<00:02, 15.47 passes/s]/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1761', of the source model, has been renamed to 'var_1761' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 36.73 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 14.41 passes/s]
Validating Core ML model...
-[✓] Core ML model output names match reference model ({'logits'})
- Validating Core ML model output "logits":
-[✓] (1, 64, 32100) matches (1, 64, 32100)
-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/decoder_Model.mlpackage
This is where the fun begins :) I've only ever worked with the t5 model through transformers & pipelines. Like this:
from torchvision import models
from torchsummary import summary
from transformers import T5TokenizerFast, T5ForConditionalGeneration, pipeline
text = "summarise: The quick brown fox jumps over the lazy dog"
tokenizer = T5TokenizerFast.from_pretrained("t-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base", return_dict=True)
model.to('cuda')
tokens = tokenizer(text, return_tensors="pt")
input_ids = tokens.input_ids
outputs = model.generate(input_ids.cuda(), max_length=40)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
As far as I understand by using the model.generate
method the transformers utilities do all the heavy lifting here like creating the attention_masks, running the encoder, passing the encoder_hidden_states along, etc. pp.
Am I right to assume that I would have to implement all this functionality by hand if I want to work with the CoreML encoder / decoder models?
I'm not only worried about using them in Python, but would also like to use them in Swift. But I guess there's no easy plug'n play solution here, right? :)
Indeed you would have to manage all that stuff yourself.
Edit: It might be useful if we provided some Swift wrapper code for this that would hide the complexity (since it's the same for most Transformer models) but right now we don't have this.
yikes! I was ready to put my gloves on, but I've spent two days now trying to get the encoder / decoder models to run in python without going through model.generate
without success (except generating gibberish sentences :)
@hollance Hey, I came around of implementing "that stuff" and have it running in Swift on MacOS and iOS now :) However, the converted model runs exclusively on the CPU (although the Performance Report suggests that some layers are available for GPU / ANE processing - s. screenshot). Is there anything I can do to make this happen? Right now it works, but it's rather slow.
Hi @seboslaw!
I've recently done a similar exercise, and discovered that if the model accepts flexible shapes, then Core ML only uses the CPU. In the case of sequence-to-sequence models such as T5, the decoder is configured to accept inputs whose length is unbounded, as you can see in the Predictions tab of Xcode (1 x 1...
means a batch size of 1 and a sequence length of at least 1, with no upper bound):
I tried to work around this issue by using fixed shapes, but so far I've only tested autoregressive models. Using a fixed sequence length of, say, 128
, makes it possible for Core ML to engage the GPU (even though the ANE is still unused). I'm not sure if this is practical or even possible for the model you are interested in, as the sequence length depends a lot on your particular use case.
In addition, using fixed shapes requires that you prepare your inputs using padding and the appropriate attention masks, which is a bit more work to be done in the Swift code.
This is a very interesting area for us, and as Matthijs mentioned we are considering whether to create some Swift wrappers and a set of "best practices" for conversion to help with these tasks. (No promises though, we're still assessing the problem :)
Hey @pcuenca, thx for your reply!
I've tried your suggestion (I think I did :) and updated the upperBounds of the input parameters. However, the Performance Report still says "CPU only" (see below) :(
I used coremltools to edit the inputs of my already converted decoder model:
import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft
model = coremltools.models.MLModel('../Common/dec.mlpackage')
spec = model.get_spec()
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir) # if model is an mlprogram
input = spec.description.input[0]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 128
input = spec.description.input[1]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
input = spec.description.input[2]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
input = spec.description.input[3]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
output = spec.description.output[0]
output.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
# print(output)
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir)
model.save("YourNewModel.mlpackage")
Since this didn't seem to work I looked into providing the inputs to the hf exporters tool directly. But then I saw that "The sequence_length specified in the configuration object is ignored" if "seq2seq" is provided.
I've tried your suggestion (I think I did :) and updated the upperBounds of the input parameters
Sorry, I think I wasn't clear. I didn't mean to make the upper limit bounded, but to use fixed shapes for all dimensions. This is an example of a model where Core ML uses the GPU for all operations:
The first dimension is always 1
, and the second dimension is always 128
. My apologies for the confusion!
Hey @pcuenca,
no worries - you were clear, I simply lack experience with the exporter :) I think I understand what needs to be done now, however, it seems that exporters
currently doesn't support this, right?
I need to export the T5 as two separate models, thus providing the seq2seq
parameter to my custom MLConfig. However, as the README states that if I set sequence_length
in my custom MLConfig, it will be ignored:
https://github.com/huggingface/exporters/tree/20e849200d2e4fb29711a7ed8f37c7a16234e60f#exporting-an-encoder-decoder-model
The sequence_length specified in the configuration object is ignored if "seq2seq" is provided.
Why is it this way anyway? And is there a way to get this done aside from patching convert.py
?
This is what I've started with (only decoder_input_ids
for now):
from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
from exporters.coreml import export
from exporters.coreml.models import T5CoreMLConfig
from transformers import T5TokenizerFast, T5ForConditionalGeneration
from collections import OrderedDict, UserDict
from exporters.coreml.config import InputDescription
class MyCoreMLConfig(T5CoreMLConfig):
@property
def inputs(self) -> OrderedDict[str, InputDescription]:
input_descs = super().inputs
input_descs["decoder_input_ids"].sequence_length = 128
return input_descs
model_ckpt = "Einmalumdiewelt/T5-Base_GNAD"
base_model = T5ForConditionalGeneration.from_pretrained(model_ckpt, torchscript=True)
preprocessor = T5TokenizerFast.from_pretrained(model_ckpt)
coreml_config = MyCoreMLConfig(base_model.config, task="text2text-generation", seq2seq="decoder")
decoder_mlmodel = export(preprocessor, base_model, coreml_config)
decoder_mlmodel.save('Test.mlpackage')
In the meantime I've tried editing the (using the exporter) exported MLModel through coremltools:
import coremltools
import numpy as np
import coremltools.proto.FeatureTypes_pb2 as ft
from coremltools.proto import FeatureTypes_pb2
model = coremltools.models.MLModel('../Common/dec.mlpackage')
spec = model.get_spec()
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir) # if model is an mlprogram
input = spec.description.input[0]
# Create a new MultiArrayType
new_type = FeatureTypes_pb2.ArrayFeatureType()
new_type.shape.extend([1, 128])
new_type.dataType = FeatureTypes_pb2.ArrayFeatureType.INT32
# Replace the old type with the new one
input.type.multiArrayType.CopyFrom(new_type)
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir)
model.save("YourNewModel.mlpackage")
However, I receive this error:
/opt/homebrew/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error: Encountered an error while compiling a neural network model: validator error: Model input 'decoder_input_ids' has a different shape than its corresponding parameter to main.".
_warnings.warn(
So as far as I understand modifying an exported MLModel is off the table. @pcuenca Do you think doing it the way described in my prev post will be possible?
Testing T5 is high up in my to-do list, I hope to get to it pretty soon and hopefully I'll have some insight then :) Sorry for the no-answer though.
@pcuenca no worries and I totally understand :) Could you tell me real quick though why the sequence_length specified in the configuration object is ignored if "seq2seq" is provided? That way I can maybe start digging into the exporters implementation and try to fix it on my end.
I think I originally made it ignore the sequence_length
because seq2seq models always need variable-length inputs. Well, unless you're trying to work around Core ML limitations, I guess. ;-)
@seboslaw What you tried to do here used to work, but in newer versions of Core ML it results in the error you've seen. The problem is that the model was compiled with flexible shapes and this is inconsistent with the (fixed) shape you assign later on.
I'm working in a local branch with some quick and dirty modifications to convert T5 using fixed shapes. I can push it later today so that you can keep testing on your end.
@seboslaw This is the branch: https://github.com/huggingface/exporters/pull/37. I have other local changes, so I hope I didn't break or miss anything. I verified that T5 encoder and decoder export with fixed shapes for all their inputs, and that Xcode's performance report successfully chooses the GPU for all operations. I haven't tried to run inference inside an app yet.
@pcuenca awesome! I’ll give it a try as soon as I’m in front of my computer. Thanks a lot already for the effort!
@pcuenca I tried it, but unfortunately it gives different results when compared to the non-GPU model. Hopefully, I simply messed up the padding. Right now I'm focussing on the decoder. I padded as follows:
decoder_input_ids: padded with 0s
decoder_attention_mask: leading 1s the size of the unpadded decoder_input_ids
encoder_last_hidden_state (1 x 128 x 768): padded the 2nd dimension (formerly 104, now 128) with zero-filled [768] arrays/tensors
encoder_attention_mask: leading 1s the size of the unpadded decoder_input_ids
Would you say that's correct?
EDIT: Another problem I found is that the decoder_output.token_scores
have the "wrong" dimension. Before my decoder inputs on the very first run looked like this:
decoder_input_ids: [0]
decoder_attention_mask: [1]
encoder_last_hidden_state: Array with dim 1x104x768
encoder_attention_mask: [1]
decoder_output.token_scores
then had the output dimension: 1x1x768.
With the new model my inputs look like this:
decoder_input_ids: [0,0,0,....0] (dim=128)
decoder_attention_mask: [1,0,0,0,0,....0] (dim=128)
encoder_last_hidden_state: MLMultiArray with dim 1x128x768 (the last 24 tensors of the 2nd dim are filled with 0s)
encoder_attention_mask: [1,0,0,0,0,....0] (dim=128)
decoder_output.token_scores
now has the output dimension: 1x128x768.
I'm not experienced with the sec2sec model architecture, but aren't the attention_masks supposed to suppress the additional decoder_input_ids
entries/padding?
@seboslaw Did you get summarization to work in Swift? How did you implement it? I converted the model, but don't know how to use it, and wasn't able to find much information online.