EasyOCR icon indicating copy to clipboard operation
EasyOCR copied to clipboard

Export to ONNX and use ONNX Runtime, working. Guide.

Open Kromtar opened this issue 2 years ago • 17 comments

This is an explanation of how to export the recognition model and the detection model to ONNX format. Then a brief explanation of how to use ONNX Runtime to use these models.

ONNX is an intercompatibility standard for AI models. It allows us to use the same model in different types of programming languages, operating systems, acceleration platforms and runtimes. Personally I need to make a C++ build of EasyOCR functionality. After failing, due to several reasons, to make a C++ build using Pytorch and the EasyOCR models, I found that the best solution is to transform the models to ONNX and then program in C++ using ONNX Runtime. Then, compiling is very easy compared to PyTorch.

Due to time constraints I am not presenting a PR. It will be necessary for you to modify a copy of EasyOCR locally.

Requirements

We must install the modules: onnx and onnxruntime. In my case I also had to manually install the protobuf module in version 3.20.

I am using:

  • EasyOCR 1.5.0
  • Python 3.9.9
  • torch 1.10.1
  • torchvision 0.11.2
  • onnx 1.11.0
  • onnxruntime 1.11.1

Exporting ONNX models

The best place to modify the EasyOCR code to export the models is right after EasyOCR uses the loaded model to perform the prediction.

Exporting detection model

In easyocr/detection.py after y, feature = net(x) (line 46) add:

    batch_size_1 = 500
    batch_size_2 = 500
    in_shape=[1, 3, batch_size_1, batch_size_2]
    dummy_input = torch.rand(in_shape)
    dummy_input = dummy_input.to(device)

    torch.onnx.export(
        net.module,
        dummy_input,
        "detectionModel.onnx",
        export_params=True,
        opset_version=11,
        input_names = ['input'],
        output_names = ['output'],
        dynamic_axes={'input' : {2 : 'batch_size_1', 3: 'batch_size_2'}},
    )

We generate a dumb input, totally random, so that onnx can perform the export. It doesn't matter the input, the important thing is that it has the correct structure. The detection model uses an input that is a 4-dimensional tensor, where the first dimension always has a value of 1, the second a value of 3 and the third and fourth values depend on the resolution of the analyzed image. I have assumed this conclusion after analyzing the data flow, I may be in error and this needs to be corrected.

Note that we export with the parameters (export_params=True) and specify that the two final dimensions of the input tensor are of dynamic size (dynamic_axes=...).

Then we can add this code to immediately import the exported model and validate that it is not corrupted:

onnx_model = onnx.load("detectionModel.onnx")
try:
    onnx.checker.check_model(onnx_model)
except onnx.checker.ValidationError as e:
    print('The model is invalid: %s' % e)
else:
    print('The model is valid!')

Remember to import onnx in the file header.

To run the export just use EasyOCR and perform an analysis on any image indicating the language to be detected. This will download the corresponding model, run the detection and simultaneously export the model. If we change the language we will have to export a new model. Once the model is exported, we can comment or delete the code.

Exporting the recognition model

This model is a bit more difficult to export and we will have to do some black magic.

In easyocr/recognition.py after preds = model(image, text_for_pred) (line 111) add:

    batch_size_1_1 = 500
    in_shape_1=[1, 1, 64, batch_size_1_1]
    dummy_input_1 = torch.rand(in_shape_1)
    dummy_input_1 = dummy_input_1.to(device)

    batch_size_2_1 = 50
    in_shape_2=[1, batch_size_2_1]
    dummy_input_2 = torch.rand(in_shape_2)
    dummy_input_2 = dummy_input_2.to(device)

    dummy_input = (dummy_input_1, dummy_input_2)

    torch.onnx.export(
        model.module,
        dummy_input,
        "recognitionModel.onnx",
        export_params=True,
        opset_version=11,
        input_names = ['input1','input2'],
        output_names = ['output'],
        dynamic_axes={'input1' : {3 : 'batch_size_1_1'}},
    )

As with the detection model, we create a dumb input to be able to export the model. In this case, the model input is 2 elements.

The first element is a 4-dimensional tensor, where the first dimension always has a value of 1, the second a value of 1, the third a value of 64 and the fourth a dynamic value.

The second element is a 2-dimensional tensor, where the first dimension always has a value of 1 and the second a dynamic value.

Again, I may be wrong about the structure of these inputs, it was what I observed empirically.

First strange thing: ONNX for some reason, in performing its analysis of the model structure, concludes that the second input element does not perform any function. So even if we tell ONNX to export a model with 2 input elements, it will always export a model with 1 input element. It appears that this is due to an internal ONNX process where it "cuts" parts of the network defining graph that do not alter the network output. According to the documentation we can stop this "cutting" process and export the network without optimization using the do_constant_folding=False parameter as an option. But due to a bug it is not taking effect. In spite of the above, we can observe that this lack of the second element does not generate losses in the accuracy of the model. For this reason, in the dynamic elements (dynamic_axes=) we only define one element where its third dimension is variable in size. If anyone manages to export the model with the two input elements, it would be appreciated if you could notify us.

Second strange thing: In order to export the recognition model, we must edit easyocr/model/vgg_model.py. It turns out that the AdaptiveAvgPool2d operator is not fully supported by ONNX. When it uses the "None" option, in the configuration tuple (which indicates that the size must be equal to the input), the export fails. To fix this we need to change line 11:

From self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((None, 1)) to self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((256, 1))

Why 256? I don't know. Is there a better option? I have not found one. Does it generate errors in the model? I have not been able to find any accuracy problems. If someone can explain why with 256 it works and what the consequences are, it would be appreciated.

Well then, just like the detection model we can add these lines to validate the exported model:

onnx_model = onnx.load("detectionModel.onnx")
try:
    onnx.checker.check_model(onnx_model)
except onnx.checker.ValidationError as e:
    print('The model is invalid: %s' % e)
else:
    print('The model is valid!')

Remember to import onnx in the file header.

To export the recognition model we must run EasyOCR using any image and the desired language. In the process you will see that some alerts will be generated, but you can ignore them. The model will be exported several times, since the added code has been placed inside a for loop. But this should not cause any problems. Remember to comment or remove the added code afterwards. If you change language, you must export a new ONNX model.

Using ONNX models in EasyOCR

To test and validate that the models work, we will modify the code again. This time we will comment the lines where EasyOCR uses the Pytorch prediction and we will add the code to use ONNX Runtime to perform the prediction.

Using the ONNX detection model

First we must add this helper function to the file easyocr/detection.py:

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

Then we must comment on linear 46 where it says y, feature = net(x). After this line we must add:

ort_session = onnxruntime.InferenceSession("detectionModel.onnx")
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)
y = ort_outs[0]

Remember to import onnxruntime in the file header.

In this way we load the ONNX model of detection and pass as input the value "x". Since ONNX does not use Pytorch, we must convert "x" from a Tensor to a standard numpy array. Para eso usamos la función de ayuda The output of ONNX is left in the "y" variable.

One last modification must be made on lines 51 and 52. Change from:

score_text = out[:, :, 0].cpu().data.numpy()
score_link = out[:, :, 1].cpu().data.numpy()

to

score_text = out[:, :, 0]
score_link = out[:, :, 1]

This is because the model output is already a numpy array and does not need to be converted from a Tensor.

To test, we can run EasyOCR with some image and see the result.

Using the ONNX recognition model

We must add the help function to the file easyocr/recognition.py:

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

Then we must comment on linear 111 to stop using PyTorch prediction: preds = model(image, text_for_pred). And right after that add:

ort_session = onnxruntime.InferenceSession("recognitionModel.onnx")
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(image)}
ort_outs = ort_session.run(None, ort_inputs)
preds = torch.from_numpy(ort_outs[0])

Remember to import onnxruntime in the file header.

We can see how we are only passing one input entity. Although this model, in theory, is supposed to receive two. As with the detection model, the input must be transformed from a Tensor to a numpy array. We convert the output from an array to a Tensor, so that the data flow continues normally.

To test, we can run EasyOCR with some image and see the result.

Others

We can use this function to compare the output of the PyTorch model and the ONNX model to quantify the difference:

np.testing.assert_allclose(to_numpy(<PYTORCH_PREDICTION>), <ONNX_PREDICTION>, rtol=1e-03, atol=1e-05)

In my tests, the difference between the detection models is minimal and passes the test correctly.

In case of the difference in the recognition models, the difference is slightly larger and the test fails. In spite of this it fails by very little and I have not observed failures in the actual recognition of the characters. I don't know if this is due to the problem with ONNX not detecting the two input entities, the problem with AdaptiveAvgPool2d or just a natural error in the model export and decimal approximations.

Final note

I hope this will be of help to continue with the development of this excellent tool. I hope that exporters in EasyOCR and Pytorch can review this and find the answers to the questions raised.

Kromtar avatar Jun 05 '22 22:06 Kromtar

Looks great.

The pytorch library will take nearly 3GB, way too large to publish.

I managed to get a simple working demo with onnxruntime, and generated folder from pyinstaller is 376MB (167MB zipped).

Is there anyone else interested in implementing a runtime version of easyocr, with no torch dependency?

AutumnSun1996 avatar Jun 06 '22 07:06 AutumnSun1996

@AutumnSun1996 I'm working on it in my spare time.

I am having doubts if the transition process I mentioned in the guide from PyTorch to ONNX generates lower accuracy and/or performance in EasyOCR. Could you confirm me if in your ONNX implementation you observe the same accuracy and performance as in PyTorch? Have you tested the performance of the ONNX runtime using CUDA?

Kromtar avatar Jun 07 '22 22:06 Kromtar

Found similar behavior for exported model: diff in the recognition model output, but final text are the same. I did not test the performance, since my goal is to minimize package size. Maybe I can do some simple checks when I got some spare time.

AutumnSun1996 avatar Jun 15 '22 08:06 AutumnSun1996

@Kromtar can I ask you which version of PyTorch you have? I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

Itaybre avatar Jun 24 '22 20:06 Itaybre

@Itaybre

python 3.9.9 torch 1.10.1 torchvision 0.11.2 onnx 1.11.0 onnxruntime 1.11.1

These are the versions of the main packages I have installed

Kromtar avatar Jun 24 '22 21:06 Kromtar

@Kromtar can I ask you which version of PyTorch you have? I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

hello, I'm having the same problems, did you solve this?

dovanhuong avatar Jul 05 '22 02:07 dovanhuong

@Kromtar can I ask you which version of PyTorch you have? I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

hello, I'm having the same problems, did you solve this?

Unfortunately no, I ended up exporting the default model to an older version of PyTorch to make it work on my environment

Itaybre avatar Jul 05 '22 15:07 Itaybre

Getting the same error here, with the same versions installed as specified by @Kromtar (thanks for your hard work on this btw, awesome that you managed to do this and hopefully more of us will follow!). If anybody has a solution it'd be great to hear - I'll keep exploring also and will post if I find something.

Also just another note that I got an error prior to this and had to change model.module to model when running torch.onnx.export() - this worked to export the detector.

@Kromtar can I ask you which version of PyTorch you have? I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

hello, I'm having the same problems, did you solve this?

MaxAntson avatar Jul 07 '22 18:07 MaxAntson

    dynamic_axes={'input' : {2 : 'batch_size_1', 3: 'batch_size_2'}},

Technically it would be clearer to call these 'height' and 'width' respectively. The first dimension is the batch size (1), the second the number of channels (3), the third the height and fourth the width as the input is in NCHW format.

FWIW if your input during inferencing will have a fixed height and width, there are potential performance benefits from leaving those dimensions as fixed sizes instead of using dynamic_axes for them. e.g. constant folding may be able to pre-calculate some values during model loading instead of during every inference.

skottmckay avatar Jul 11 '22 07:07 skottmckay

Hi everyone, thanks for the feedback. I have been incredibly busy the last 3 weeks. I finally have some free time.

I will check again the bug to find the solution, since in my environment it is working perfectly.

Best regards.

Kromtar avatar Jul 11 '22 13:07 Kromtar

Hi, @Kromtar. What is the next step for testing with real images in mobile phones? I am sorry if it's too obvious, but I am not familiar with onnx at all and need to know what to do to test images using android/ios phones? Is there any environment where you connect phone with easyOCR? Thank you.

Also, since I am into it anyway, I'd love to learn more about every single step to make any ml/dl model work in mobile phones. I'd appreciate if you can share materials (posts, projects, etc.) related to this task. Thanks once again.

bit-scientist avatar Jul 13 '22 05:07 bit-scientist

Okay guys I found the source of the error mentioned by @Itaybre.

The problem is that "torch.onnx.export", in case of the recognition model, only works when EasyOCR is running in GPU mode (i.e. using cuda cores). Apparently this is due to how ONNX proceeds to parse and export a very specific network layer, only used by the recognition model. I have not been able to find a solution that does not involve making substantial changes to Torch.

My recommendation is to follow the guide, but make sure EasyOCR is running in GPU mode. For this we will be required to have an NVIDIA graphics card with CUDA and the corresponding drivers installed.

Soon I will publish a container with everything previously configured.

...I tried my best to make the export work without having to have a CUDA enabled card... but I didn't succeed, I'm sorry ;(

Kromtar avatar Jul 16 '22 23:07 Kromtar

I have created a new issue where I have made available the ONNX version of the EasyOCR models for all languages. Feel free to download and use them.

Kromtar avatar Jul 17 '22 02:07 Kromtar

I have published a branch in this fork where you can find the whole process using containers. You can see the readme to understand how to use it.

Kromtar avatar Jul 17 '22 04:07 Kromtar

@Kromtar Thank you for your convert code. I have a dummy question, in your code for conversion recognition model to onnx format, I saw we have 2 input, but when I use Netron app to preview the onnx file, but I cannot find input_2 in model converted, I also observe your converted models image

long-senpai avatar Aug 10 '22 11:08 long-senpai

@long-senpai I don't really know why this happens in the conversion process. I think that ONNX, when optimizing the model, discovers that the weights provided by input 2 are unnecessary; so it deletes them.

The last few weeks I have been working on comparing performances between the models before and after converting.

What I can confirm is that independent of input 2, the output of the converted model is the same as the output of the original model. So don't worry, there is no loss of performance.

Again, the origin of why ONNX does that, I don't know.

Kromtar avatar Aug 10 '22 13:08 Kromtar

@Kromtar does detection model in ONNX format support running with batch?

@Kromtar I tried export with dynamic batch image and it worked.

Phelan164 avatar Aug 28 '22 17:08 Phelan164

Do you have exported onnx models for generation2 to download directly? Can these onnx files be used directly without EasyOcr? Any dict file needed?

nissansz avatar Jan 15 '23 00:01 nissansz

My recommendation is to follow the guide, but make sure EasyOCR is running in GPU mode. For this we will be required to have an NVIDIA graphics card with CUDA and the corresponding drivers installed.

I found how to export ONNX model with CPU only for the recognition model. In fact, it needs to remove quantization, so need to comment these lines.

if quantize:
  try:
      torch.quantization.quantize_dynamic(model, dtype=torch.qint8, inplace=True)
  except:
      pass

A2va avatar Jan 17 '23 12:01 A2va

Having troubles exporting the model

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/easyocr/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 968, in symbolic_fn
    output_size = _parse_arg(output_size, "is")
  File "/home/user/anaconda3/envs/easyocr/lib/python3.9/site-packages/torch/onnx/symbolic_helper.py", line 83, in _parse_arg
    raise RuntimeError("Failed to export an ONNX attribute '" + v.node().kind() +
RuntimeError: Failed to export an ONNX attribute 'onnx::Gather', since it's not constant, please try to make things (e.g., kernel size) static if possible

samiechan avatar Feb 07 '23 16:02 samiechan

Nvm, I needed to update nn.AdaptiveAvgPool2d in my custom_model.py

samiechan avatar Feb 07 '23 16:02 samiechan

Here is another approach to export the CRAFT (detection) model to ONNX format on Windows using WSL2:

  1. Install conda on WSL2 Ubuntu 20.01
  2. Install the necessary libraries and create a new environment:
conda create -n easyocr
conda activate easyocr
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install onnxruntime-gpu
pip install easyocr
  1. Download the CRAFT model file from the EasyOCR release page:
wget https://github.com/JaidedAI/EasyOCR/releases/download/pre-v1.1.6/craft_mlt_25k.zip
unzip craft_mlt_25k.zip
  1. Load the CRAFT model and export it to ONNX format:
from easyocr import detection
import torch

# load model using CPU - default
model = detection.get_detector(trained_model='craft_mlt_25k.pth', device='cpu', quantize=False)
# load model using GPU - for custom models if trained on gpu
# model = detection.get_detector(trained_model='craft_mlt_25k.pth', device='cuda:0', quantize=False)
dummy_input = torch.randn(1, 3, 384, 512)
torch.onnx.export(model, dummy_input, "craft.onnx")
  1. Perform text detection using the ONNX model using ONNX Runtime (also see # https://github.com/clovaai/CRAFT-pytorch/issues/4):
import onnxruntime as rt
import cv2
import numpy as np
from easyocr.craft_utils import getDetBoxes, adjustResultCoordinates
from easyocr.imgproc import resize_aspect_ratio, normalizeMeanVariance
from easyocr.utils import reformat_input

# Read input image
img, _ = reformat_input('https://jeroen.github.io/images/testocr.png')

# Resize and normalize input image
img_resized, target_ratio, size_heatmap = resize_aspect_ratio(img, 512, interpolation=cv2.INTER_LINEAR, mag_ratio=1.)
ratio_h = ratio_w = 1 / target_ratio
x = normalizeMeanVariance(img_resized)
x = torch.from_numpy(x).permute(2, 0, 1).unsqueeze(0)

# Create ONNX Runtime session and load model
providers = ['CPUExecutionProvider']
session = rt.InferenceSession("craft.onnx", providers=providers)
input_name = session.get_inputs()[0].name

# Prepare input tensor for inference
inp = {input_name: x.numpy()}

# Run inference and get output
y, _ = session.run(None, inp)

# Extract score and link maps
score_text = y[0, :, :, 0]
score_link = y[0, :, :, 1]

# Post-processing to obtain bounding boxes and polygons
boxes, polys, mapper = getDetBoxes(score_text, score_link, 0.5, 0.4, 0.4)
boxes = adjustResultCoordinates(boxes, ratio_w, ratio_h)
polys = adjustResultCoordinates(polys, ratio_w, ratio_h)

You can use craft-text-detector export_detected_regions function to export bounding boxes as cropped images (there are issues with polygons - you need to provide poly=True in getDetBoxes to get values).

samiechan avatar Feb 22 '23 09:02 samiechan

@Kromtar

Why 256? I don't know. Is there a better option? I have not found one. Does it generate errors in the model? I have not been able to find any accuracy problems. If someone can explain why with 256 it works and what the consequences are, it would be appreciated.

It is related to network parameters, I believe.

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

You can find it here.

samiechan avatar Feb 23 '23 14:02 samiechan

Here is approach for recognition model:

  1. Download the desired recognition language model. Example: wget https://github.com/JaidedAI/EasyOCR/releases/download/v1.6.1/cyrillic_g2.zip
  2. Provide language model configuration:
from easyocr import recognition
#import yaml
import os

recog_network = 'generation2'

# for custom model
#with open(recog_network + '.yaml', encoding='utf8') as file:
  #recog_config = yaml.load(file, Loader=yaml.FullLoader)

#network_params = recog_config['network_params']

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

# for custom model
#character = recog_config['character_list']


# see https://github.com/JaidedAI/EasyOCR/blob/ca9f9b0ac081f2874a603a5614ddaf9de40ac339/easyocr/config.py for other language config examples
character = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюяЂђЃѓЄєІіЇїЈјЉљЊњЋћЌќЎўЏџҐґҒғҚқҮүҲҳҶҷӀӏӢӣӨөӮӯ'
symbol = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽'
model_path = "cyrillic_g2.pth"
separator_list = {}
cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
                      'ava','dar','inh','che','lbe','lez','tab','tjk', 'en']
package_dir = os.path.dirname(recognition.__file__)

dict_list = {}
for lang in cyrillic_lang_list:
    dict_list[lang] = os.path.join(package_dir, 'dict', lang + ".txt")

model, converter = recognition.get_recognizer(recog_network=recog_network, network_params=network_params, character=character, separator_list=separator_list, dict_list=dict_list, model_path=model_path, device='cpu', quantize=False)
  1. Export recognition model to ONNX:
import torch
import torchvision.transforms as transforms

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128
device = 'cpu'

# Create dummy input tensors for the image and text inputs
dummy_input = torch.randn(batch_size, num_channels, image_height, image_width)

# Define the maximum length of the text input
max_text_length = 10

dummy_text_input = torch.LongTensor(max_text_length, batch_size).random_(0, 10)

# Convert the input image to grayscale
grayscale_transform = transforms.Grayscale(num_output_channels=1)
grayscale_input = grayscale_transform(dummy_input)
grayscale_input = grayscale_transform(dummy_input.unsqueeze(0)).squeeze(0)

input_names = ["image_input", "text_input"]
output_names = ["output"]
dynamic_axes = {"image_input": {0: "batch_size"}, "text_input": {1: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (grayscale_input, dummy_text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)
  1. Modify recognizer_predict, to run model inference using ONNX runtime: Replace preds = model(image, text_for_pred) with
providers = ['CPUExecutionProvider']
session = rt.InferenceSession("recog.onnx", providers=providers)
inputs = session.get_inputs()
inp = {inputs[0].name: image.numpy()}
preds = session.run(None, inp)
preds = torch.from_numpy(preds[0])

Here is an example how to run it with one cropped image from the detection model:

import torch
import onnxruntime as rt
import numpy as np
from easyocr.utils import reformat_input, get_image_list

# read image
img, img_cv_grey = reformat_input('/content/outputs/image_crops/crop_0.png')

y_max, x_max = img_cv_grey.shape

horizontal_list = [[0, x_max, 0, y_max]]

lang_char = []
char_file = os.path.join(package_dir, 'character', lang + "_char.txt")
with open(char_file, "r", encoding = "utf-8-sig") as input_file:
  char_list =  input_file.read().splitlines()
lang_char += char_list
lang_char = set(lang_char).union(set(symbol))

ignore_char = ''.join(set(character)-set(lang_char))

result = []

for bbox in horizontal_list:
    h_list = [bbox]
    f_list = []
    image_list, max_width = get_image_list(h_list, f_list, img_cv_grey, model_height=64) # 64 is default value
    result0 = get_text(character, imgH, int(max_width), converter, image_list,\
                              ignore_char, 'greedy', beamWidth = 5, batch_size = batch_size, contrast_ths = 0.1, adjust_contrast = 0.5, filter_ths = 0.003,\
                              workers = 0, device = device)
    result += result0

Also for custom models you can use GPU: providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if rt.get_device()=='GPU' else ['CPUExecutionProvider']

Also yes, the ONNX model has only 1 input.

samiechan avatar Feb 24 '23 23:02 samiechan

@samiechan Onnx image_width is not dynamic, if our input image width exceeds it it'll get an error. Is there a way to make it dynamic?

light42 avatar Feb 27 '23 04:02 light42

@samiechan Onnx image_width is not dynamic, if our input image width exceeds it it'll get an error. Is there a way to make it dynamic?

For detection model:

from easyocr import detection
import torch

model = detection.get_detector(trained_model='craft_mlt_25k.pth', device='cpu', quantize=False)

input_shape = (1, 3, 480, 640)
inputs = torch.ones(*input_shape)
input_names=['input']
output_names=['output']

dynamic_axes= {'input':{0:'batch_size', 2:'height', 3:'width'}, 'output':{0:'batch_size', 2:'height', 3:'width'}} #adding names for better debugging
torch.onnx.export(model, inputs, "craft.onnx", dynamic_axes=dynamic_axes, input_names=input_names, output_names=output_names)

For recognition model:

import torch
import torchvision.transforms as transforms

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128

image_input_shape = (batch_size, 1, image_height, image_width)
image_input = torch.ones(*image_input_shape)

max_text_length = 10
text_input_shape = (batch_size, max_text_length)
text_input = torch.ones(*text_input_shape)

input_names=['image_input', 'text_input']
output_names=['output']

dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (image_input, text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)

samiechan avatar Feb 27 '23 12:02 samiechan

It's important to note that there are two aspects here. One is whether the model inputs have fixed or dynamic sizes, and the other is whether the model itself supports dynamic sizes. e.g. if the model was trained with input of 64 x 128 and does not internally resize input, you will most likely get an error about sizes mismatching as the nodes and weights in the model will be expecting sizes relative to 64 x 128.

Basically the model inputs being dynamic allows any value to be specified but that does not mean the model itself supports any value.

If you are pre-processing the image prior to running the original pytorch model (e.g. resize, crop, normalize) you need to do the same things prior to running the exported ONNX model.

We have some new helpers that are about to be released that may be helpful. They allow adding these common pre-processing steps into the ONNX model so that onnxruntime can do those. The latest onnxruntime (1.14) also includes the updated ONNX Resize operator that supports anti-aliasing, providing equivalency with the typical Pillow based image sizing used by pytorch..

Overview documentation: https://github.com/microsoft/onnxruntime-extensions/blob/main/onnxruntime_extensions/tools/Example%20usage%20of%20the%20PrePostProcessor.md

There are some example implementations, including one showing what the pre-processing pipeline would look like for a model with the common pytorch image pre-processing here: https://github.com/microsoft/onnxruntime-extensions/blob/7578af836146b015bbd7a8539f3288cc539660ad/onnxruntime_extensions/tools/add_pre_post_processing_to_model.py#L23

It's also possible to do image conversion from png or jpg as part of the pre-processing, although that requires the onnxruntime-extensions library to be available at runtime as it uses a custom operator (i.e. not an operator defined in the ONNX spec). We have prebuilt android and ios packages for onnxruntime-extensions in this first release of the new tools, so you'd have to build it yourself for other platforms.

skottmckay avatar Feb 27 '23 22:02 skottmckay

Here is approach for recognition model:

  1. Download the desired recognition language model. Example: wget https://github.com/JaidedAI/EasyOCR/releases/download/v1.6.1/cyrillic_g2.zip
  2. Provide language model configuration:
from easyocr import recognition
#import yaml
import os

recog_network = 'generation2'

# for custom model
#with open(recog_network + '.yaml', encoding='utf8') as file:
  #recog_config = yaml.load(file, Loader=yaml.FullLoader)

#network_params = recog_config['network_params']

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

# for custom model
#character = recog_config['character_list']


# see https://github.com/JaidedAI/EasyOCR/blob/ca9f9b0ac081f2874a603a5614ddaf9de40ac339/easyocr/config.py for other language config examples
character = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюяЂђЃѓЄєІіЇїЈјЉљЊњЋћЌќЎўЏџҐґҒғҚқҮүҲҳҶҷӀӏӢӣӨөӮӯ'
symbol = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽'
model_path = "cyrillic_g2.pth"
separator_list = {}
cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
                      'ava','dar','inh','che','lbe','lez','tab','tjk', 'en']
package_dir = os.path.dirname(recognition.__file__)

dict_list = {}
for lang in cyrillic_lang_list:
    dict_list[lang] = os.path.join(package_dir, 'dict', lang + ".txt")

model, converter = recognition.get_recognizer(recog_network=recog_network, network_params=network_params, character=character, separator_list=separator_list, dict_list=dict_list, model_path=model_path, device='cpu', quantize=False)
  1. Export recognition model to ONNX:
import torch
import torchvision.transforms as transforms

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128
device = 'cpu'

# Create dummy input tensors for the image and text inputs
dummy_input = torch.randn(batch_size, num_channels, image_height, image_width)

# Define the maximum length of the text input
max_text_length = 10

dummy_text_input = torch.LongTensor(max_text_length, batch_size).random_(0, 10)

# Convert the input image to grayscale
grayscale_transform = transforms.Grayscale(num_output_channels=1)
grayscale_input = grayscale_transform(dummy_input)
grayscale_input = grayscale_transform(dummy_input.unsqueeze(0)).squeeze(0)

input_names = ["image_input", "text_input"]
output_names = ["output"]
dynamic_axes = {"image_input": {0: "batch_size"}, "text_input": {1: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (grayscale_input, dummy_text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)
  1. Modify recognizer_predict, to run model inference using ONNX runtime: Replace preds = model(image, text_for_pred) with
providers = ['CPUExecutionProvider']
session = rt.InferenceSession("recog.onnx", providers=providers)
inputs = session.get_inputs()
inp = {inputs[0].name: image.numpy()}
preds = session.run(None, inp)
preds = torch.from_numpy(preds[0])

Here is an example how to run it with one cropped image from the detection model:

import torch
import onnxruntime as rt
import numpy as np
from easyocr.utils import reformat_input, get_image_list

# read image
img, img_cv_grey = reformat_input('/content/outputs/image_crops/crop_0.png')

y_max, x_max = img_cv_grey.shape

horizontal_list = [[0, x_max, 0, y_max]]

lang_char = []
char_file = os.path.join(package_dir, 'character', lang + "_char.txt")
with open(char_file, "r", encoding = "utf-8-sig") as input_file:
  char_list =  input_file.read().splitlines()
lang_char += char_list
lang_char = set(lang_char).union(set(symbol))

ignore_char = ''.join(set(character)-set(lang_char))

result = []

for bbox in horizontal_list:
    h_list = [bbox]
    f_list = []
    image_list, max_width = get_image_list(h_list, f_list, img_cv_grey, model_height=64) # 64 is default value
    result0 = get_text(character, imgH, int(max_width), converter, image_list,\
                              ignore_char, 'greedy', beamWidth = 5, batch_size = batch_size, contrast_ths = 0.1, adjust_contrast = 0.5, filter_ths = 0.003,\
                              workers = 0, device = device)
    result += result0

Also for custom models you can use GPU: providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if rt.get_device()=='GPU' else ['CPUExecutionProvider']

Also yes, the ONNX model has only 1 input.

@samiechan Hello. Can you please tell me the version of your libraries. Your code gave me an error. pytorch 1.9.1 RuntimeError Traceback (most recent call last) ... RuntimeError: Unsupported: ONNX export of operator adaptive pooling, since output_size is not constant.. Please feel free to request support or submit a pull request on PyTorch GitHub.

kadmor avatar Mar 01 '23 06:03 kadmor

import torch import torchvision.transforms as transforms

Define the dimensions of the input image

batch_size = 1 num_channels = 1 image_height = imgH = 64 image_width = 128

image_input_shape = (batch_size, 1, image_height, image_width) image_input = torch.ones(*image_input_shape)

max_text_length = 10 text_input_shape = (batch_size, max_text_length) text_input = torch.ones(*text_input_shape)

input_names=['image_input', 'text_input'] output_names=['output']

dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}} opset_version = 12

torch.onnx.export(model, (image_input, text_input), "recog.onnx", input_names=input_names, output_names=output_names, dynamic_axes=dynamic_axes, opset_version=opset_version)

Here is my Google Colab: https://colab.research.google.com/drive/1pcoueUxhWFX5Ac6AA4paYDLgZMf819GT?usp=sharing

samiechan avatar Mar 03 '23 01:03 samiechan

@samiechan Thank you very much!

kadmor avatar Mar 03 '23 06:03 kadmor