coremltools "ValueError: Torch var mask.3 not found in context" when converting PyTorch model

🐞Describing the bug

When attempting to convert the text model within Huggingface Transformers' version of CLIP, the conversion fails due to an unrecognized input for a triu op.

Stack Trace

Converting PyTorch Frontend ==> MIL Ops:   4%|██▌                                                               | 31/811 [00:00<00:00, 3040.63 ops/s]
Traceback (most recent call last):
  File "/Users/austin/Projects/openai-clip-trace-v/trace_clip.py", line 83, in <module>
    ml_model = ct.convert(
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 451, in convert
    mlmodel = mil_convert(
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 193, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 220, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 283, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 115, in __call__
    return load(*args, **kwargs)
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 53, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 92, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 269, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    add_op(context, node)
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4031, in triu
    inputs = _get_inputs(context, node, expected=2)
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 190, in _get_inputs
    inputs = [context[name] for name in node.inputs]
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 190, in <listcomp>
    inputs = [context[name] for name in node.inputs]
  File "/Users/austin/miniconda3/envs/clipp/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 82, in __getitem__
    raise ValueError(
ValueError: Torch var mask.3 not found in context

To Reproduce

Setup environment

conda create --name clip-triu-bug --yes \
    "pytorch::pytorch>=1.10,<1.11" \
    huggingface::transformers \
    python-dateutil \
    pillow
conda activate clip-triu-bug
pip3 install coremltools==6.0b2

Run `clip_text_convert.py`

import torch
import numpy as np
import coremltools as ct

from transformers.models.clip.processing_clip import  CLIPProcessor
from transformers.models.clip.modeling_clip import CLIPModel

# Combine CLIP Text Model and Text Projection for exporting as one
# // Disabled text_projection as its not needed to reproduce error
class CLIPTextEmbedder(torch.nn.Module):
    def __init__(self, text_model):
        super(CLIPTextEmbedder, self).__init__()
        self.text_model = text_model
        # self.text_projection = text_projection
        
    def forward(self, input_ids):
        pooled_out = self.text_model(input_ids)[1]
        # embedded_out = self.text_projection(pooled_out)
        return pooled_out # embedded_out


# Load the pretrained CLIP Model from Huggingface Transformers
pretrained_identifier = 'openai/clip-vit-base-patch32'
processor = CLIPProcessor.from_pretrained(pretrained_identifier)
# return_dict=False so that it returns tuples... needed for tracing
model = CLIPModel.from_pretrained(pretrained_identifier, return_dict=False) 

# Isolate text embedding into separate models for export
text_for_export = CLIPTextEmbedder(model.text_model)#, model.text_projection)

# Pre-processing text for trace
text = "a painting of a corgi wearing a royal cape and crown"
# Since we're tracing, should probably pad the text so that different length inputs are pre-processed to a single input shape
preprocessed_text = processor(text=text, padding='max_length', return_tensors='pt').input_ids

with torch.no_grad():
    traced_model = torch.jit.trace(text_for_export, preprocessed_text)
    text_outputs = traced_model(preprocessed_text)
    ml_model = ct.convert(
        model=traced_model,
        inputs=[
            ct.TensorType(shape=preprocessed_text.shape, dtype=np.int64)
        ]
    )

# Export to CLIPTextEmbedder[openai_clip-vit-base-patch32].mlmodel
out_filename = f'{type(text_for_export).__name__}[{pretrained_identifier.replace("/", "_")}].mlmodel'
ml_model.save(out_filename)

System environment (please complete the following information):

coremltools version: 6.0b2
OS (e.g. MacOS version or Linux type): macOS Montery v12.2.1
Any other relevant version information (e.g. PyTorch or TensorFlow version):
- PyTorch version: 4.10.2
- transformers version: 4.21.1

Additional context

Using coremltools version 5.2 yields a different error, as it's missing the changes from PR #1439 and PR #1454, which allow version 6.0b2 to progress and then produce this error.

Aug 22 '22 02:08 AustinStarnes

@AustinStarnes - can you give us a minimal self-containded example, something which uses a much simpler model?

Aug 22 '22 22:08 TobyRoseman

You can work round this by patching the _build_causal_attention_mask() function in CLIPTextTransformer as follows:

def _build_causal_attention_mask(self, bsz, seq_len, dtype):
    # lazily create causal attention mask, with full attention between the vision tokens
    # pytorch uses additive attention mask; fill with -inf
   
    # Old code - gives a warning when traced that torch.tensor() is traced as constant    
    # mask = torch.empty(bsz, seq_len, seq_len, dtype=dtype)
    # mask.fill_(torch.tensor(torch.finfo(dtype).min))

    # Replace with:
    mask = torch.full((bsz, seq_len, seq_len), torch.finfo(dtype).min)


    mask.triu_(1)  # zero out the lower diagonal
    mask = mask.unsqueeze(1)  # expand mask
    return mask

Indeed given the issue appears to be a result of the tracer I'm not sure this a coremltools bug?

Sep 16 '22 11:09 gwjr

Same probrem, is there any progress? @AustinStarnes

Feb 16 '23 07:02 vecpeng

This issue seems to be solved in coremltools==6.3.0. You have to use int32 instead of int64.

Jun 11 '23 06:06 fukatani

coremltools coremltools copied to clipboard

"ValueError: Torch var mask.3 not found in context" when converting PyTorch model

🐞Describing the bug

Stack Trace

To Reproduce

Setup environment

Run clip_text_convert.py

System environment (please complete the following information):

Additional context

coremltools
coremltools copied to clipboard

Run `clip_text_convert.py`