coremltools
coremltools copied to clipboard
"ValueError: Torch var mask.3 not found in context" when converting PyTorch model
🐞Describing the bug
When attempting to convert the text model within Huggingface Transformers' version of CLIP, the conversion fails due to an unrecognized input for a triu
op.
Stack Trace
Converting PyTorch Frontend ==> MIL Ops: 4%|██▌ | 31/811 [00:00<00:00, 3040.63 ops/s]
Traceback (most recent call last):
File "/Users/austin/Projects/openai-clip-trace-v/trace_clip.py", line 83, in <module>
ml_model = ct.convert(
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 451, in convert
mlmodel = mil_convert(
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 193, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 220, in _mil_convert
proto, mil_program = mil_convert_to_proto(
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 283, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 115, in __call__
return load(*args, **kwargs)
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 53, in load
return _perform_torch_convert(converter, debug)
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 92, in _perform_torch_convert
prog = converter.convert()
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 269, in convert
convert_nodes(self.context, self.graph)
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
add_op(context, node)
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4031, in triu
inputs = _get_inputs(context, node, expected=2)
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 190, in _get_inputs
inputs = [context[name] for name in node.inputs]
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 190, in <listcomp>
inputs = [context[name] for name in node.inputs]
File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 82, in __getitem__
raise ValueError(
ValueError: Torch var mask.3 not found in context
To Reproduce
Setup environment
conda create --name clip-triu-bug --yes \
"pytorch::pytorch>=1.10,<1.11" \
huggingface::transformers \
python-dateutil \
pillow
conda activate clip-triu-bug
pip3 install coremltools==6.0b2
Run clip_text_convert.py
import torch
import numpy as np
import coremltools as ct
from transformers.models.clip.processing_clip import CLIPProcessor
from transformers.models.clip.modeling_clip import CLIPModel
# Combine CLIP Text Model and Text Projection for exporting as one
# // Disabled text_projection as its not needed to reproduce error
class CLIPTextEmbedder(torch.nn.Module):
def __init__(self, text_model):
super(CLIPTextEmbedder, self).__init__()
self.text_model = text_model
# self.text_projection = text_projection
def forward(self, input_ids):
pooled_out = self.text_model(input_ids)[1]
# embedded_out = self.text_projection(pooled_out)
return pooled_out # embedded_out
# Load the pretrained CLIP Model from Huggingface Transformers
pretrained_identifier = 'openai/clip-vit-base-patch32'
processor = CLIPProcessor.from_pretrained(pretrained_identifier)
# return_dict=False so that it returns tuples... needed for tracing
model = CLIPModel.from_pretrained(pretrained_identifier, return_dict=False)
# Isolate text embedding into separate models for export
text_for_export = CLIPTextEmbedder(model.text_model)#, model.text_projection)
# Pre-processing text for trace
text = "a painting of a corgi wearing a royal cape and crown"
# Since we're tracing, should probably pad the text so that different length inputs are pre-processed to a single input shape
preprocessed_text = processor(text=text, padding='max_length', return_tensors='pt').input_ids
with torch.no_grad():
traced_model = torch.jit.trace(text_for_export, preprocessed_text)
text_outputs = traced_model(preprocessed_text)
ml_model = ct.convert(
model=traced_model,
inputs=[
ct.TensorType(shape=preprocessed_text.shape, dtype=np.int64)
]
)
# Export to CLIPTextEmbedder[openai_clip-vit-base-patch32].mlmodel
out_filename = f'{type(text_for_export).__name__}[{pretrained_identifier.replace("/", "_")}].mlmodel'
ml_model.save(out_filename)
System environment (please complete the following information):
- coremltools version:
6.0b2
- OS (e.g. MacOS version or Linux type):
macOS Montery v12.2.1
- Any other relevant version information (e.g. PyTorch or TensorFlow version):
- PyTorch version:
4.10.2
- transformers version:
4.21.1
- PyTorch version:
Additional context
- Using
coremltools
version5.2
yields a different error, as it's missing the changes from PR #1439 and PR #1454, which allow version6.0b2
to progress and then produce this error.
@AustinStarnes - can you give us a minimal self-containded example, something which uses a much simpler model?
You can work round this by patching the _build_causal_attention_mask()
function in CLIPTextTransformer
as follows:
def _build_causal_attention_mask(self, bsz, seq_len, dtype):
# lazily create causal attention mask, with full attention between the vision tokens
# pytorch uses additive attention mask; fill with -inf
# Old code - gives a warning when traced that torch.tensor() is traced as constant
# mask = torch.empty(bsz, seq_len, seq_len, dtype=dtype)
# mask.fill_(torch.tensor(torch.finfo(dtype).min))
# Replace with:
mask = torch.full((bsz, seq_len, seq_len), torch.finfo(dtype).min)
mask.triu_(1) # zero out the lower diagonal
mask = mask.unsqueeze(1) # expand mask
return mask
Indeed given the issue appears to be a result of the tracer I'm not sure this a coremltools bug?
Same probrem, is there any progress? @AustinStarnes
This issue seems to be solved in coremltools==6.3.0
.
You have to use int32
instead of int64
.