torch-mlir
torch-mlir copied to clipboard
Compile error running pytorch-pretrained-bert
Using models from https://pypi.org/project/pytorch-pretrained-bert/
And running this script:
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
import torch_mlir
import logging
logging.basicConfig(level=logging.INFO)
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print("Tokens = ", tokens_tensor);
print("Segments = ", segments_tensors);
bert = BertModel.from_pretrained('bert-base-uncased')
bert.eval()
print("BERT", bert)
# Predict hidden states features for each layer
with torch.no_grad():
encoded_layers, _ = bert(tokens_tensor, segments_tensors)
# We have a hidden states for each of the 12 layers in model bert-base-uncased
assert len(encoded_layers) == 12
module = torch_mlir.compile(bert, [tokens_tensor, segments_tensors], output_type=torch_mlir.OutputType.TORCH)
Gives error during torch_mlir.compile:
RuntimeError: cannot statically infer the expected size of a list in this context: File "/mnt/swaters/iwa/torch-mlir.0/mlir_venv/lib/python3.8/site-packages/pytorch_pretrained_bert/modeling.py", line 298 new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) print("New Shape: ", new_x_shape, self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape) ~~~~~~~~~~~~ <--- HERE return x.permute(0, 2, 1, 3) 'BertSelfAttention.transpose_for_scores' is being compiled since it was called from 'BertSelfAttention.forward' File "/mnt/swaters/iwa/torch-mlir.0/mlir_venv/lib/python3.8/site-packages/pytorch_pretrained_bert/modeling.py", line 306 mixed_value_layer = self.value(hidden_states)
query_layer = self.transpose_for_scores(mixed_query_layer) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE key_layer = self.transpose_for_scores(mixed_key_layer) value_layer = self.transpose_for_scores(mixed_value_layer)
have you been able to try HF_Bert ? We use that for our tests. We will continue to debug this.
Thanks for this.
I put up some PR's that chip away at this model. https://github.com/llvm/torch-mlir/pull/824 https://github.com/llvm/torch-mlir/pull/825
It looks like https://github.com/llvm/torch-mlir/pull/796 will also be needed for it. I will check back in on this once that lands.
have you been able to try HF_Bert ? We use that for our tests. We will continue to debug this.
I have some time now, will give that a go. Thanks.
import torch
from transformers import BertTokenizer, BertModel
import torch_mlir
import logging
logging.basicConfig(level=logging.INFO)
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print("Tokens = ", tokens_tensor);
print("Segments = ", segments_tensors);
bert = BertModel.from_pretrained('bert-base-uncased', return_dict=False)
bert.eval()
# print("BERT", bert)
# Predict hidden states features for each layer
with torch.no_grad():
encoded_layers, _ = bert(tokens_tensor, segments_tensors)
# We have a hidden states for each of the 12 layers in model bert-base-uncased
#print("encoded_layers len = ", len(encoded_layers))
#assert len(encoded_layers) == 12
module = torch_mlir.compile(bert, [tokens_tensor, segments_tensors], output_type=torch_mlir.OutputType.TOSA, use_tracing=True)
It seems convert to torch ir success, but not work while convert to tosa.
Lowering Torch Backend IR -> TOSA Backend IR failed with the following diagnostics:
error: unsupported by backend lowering: tensor with unknown rank or dtype
note: see current operation: %790 = "torch.prim.TupleIndex"(%789, %221) : (!torch.tuple<tensor<[1,14,768],f32>, tensor<[1,768],f32>>, !torch.int) -> !torch.vtensor
note: this is likely due to a missing shape transfer function in shape_lib_gen.py
Error can be reproduced with:
$ torch-mlir-opt -pass-pipeline='torch-backend-to-tosa-backend-pipeline' /tmp/BertModel.mlir
Add '-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.
It seems like the issue here is possibly related to multiple returns. Can you use a wrapper module that extracts the logits? See example here: https://github.com/google/iree-torch/blob/c3d7717ef4b9c83aa4870e949d9dee588e6e190d/examples/bert.py#L48
Do you need the other return values?
It seems like the issue here is possibly related to multiple returns. Can you use a wrapper module that extracts the logits? See example here: https://github.com/google/iree-torch/blob/c3d7717ef4b9c83aa4870e949d9dee588e6e190d/examples/bert.py#L48
Do you need the other return values?
Thank you for your reply. I just need to convert pytorch model(e.g. bert, fastspeech2...) to TOSA. The example above can work well. I simply change LINALG_ON_TENSORS to TOSA at https://github.com/google/iree-torch/blob/c3d7717ef4b9c83aa4870e949d9dee588e6e190d/examples/bert.py#L95, and use mode_name="bert-base-cased". Gives error as below:
error: Integers with widths greater than 32 are not supported
note: see current operation: %435 = "torch.aten.add.Tensor"(%433, %405, %414) : (!torch.vtensor<[],si64>, !torch.vtensor<[],si64>, !torch.int) -> !torch.vtensor<[],si64>
error: failed to legalize operation 'torch.aten.add.Tensor' that was explicitly marked illegal
note: see current operation: %433 = "torch.aten.add.Tensor"(%432, %405, %414) : (!torch.vtensor<[],si64>, !torch.vtensor<[],si64>, !torch.int) -> !torch.vtensor<[],si64>
Error can be reproduced with:
$ torch-mlir-opt -pass-pipeline='torch-backend-to-tosa-backend-pipeline' /tmp/OnlyLogitsHuggingFaceModel.mlir
Add '-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.```
@sjarus -- how have you folks been dealing with the i64's in the bert models?
si64 should be acceptable to TOSA. It's not an I/O type but should permit accumulation. I'll check the dialect form; we may also have local fixes to get us around this. I'm in the process of getting these out this week.
For bert model to Tosa, I added a couple of passes to fix the errors. The 'aten.slice.Tensor' conversion seems to be missing.
have you been able to try HF_Bert ? We use that for our tests. We will continue to debug this.
Are there any documents to show how your tests are working based on HF_Bert? I am trying to use torch-mlir to compile a bert demo base on TOSA backend