unilm
unilm copied to clipboard
LayoutLMv2 & LayoutXLM can not make inference with the Half (float16) dtype on CPU
Hi,
I wanted to make inference with LayoutXLM with model parameters to Half (float16) dtype on CPU (I did try on GPU and it worked).
As I'm using Transformers from Hugging Face, I ran the following code:
from transformers import LayoutLMv2ForTokenClassification
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
param_dtype = torch.float16
model_id = "pierreguillou/layout-xlm-base-finetuned-with-DocLayNet-base-at-paragraphlevel-ml512"
model = LayoutLMv2ForTokenClassification.from_pretrained(model_id, torch_dtype=param_dtype)
model.to(device);
It worked but when I ran the model for inference with the following code, it failed:
with torch.no_grad():
output = model(input_ids=input_id.to(device),
attention_mask=attention_mask.to(device),
bbox=bbox.to(device),
image=pixel_values.to(device)
)
Error message:
[/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in layer_norm(input, normalized_shape, weight, bias, eps)
2513 layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
2514 )
-> 2515 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
2516
2517
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
It looks like that dtype float32 is directly implemented in the LayoutLMv2 code.
How to solve this issue? Thanks.