donut icon indicating copy to clipboard operation
donut copied to clipboard

Error: "Make sure `_init_weights` is implemented for <class 'donut.model.DonutModel'>"

Open csanadpoda opened this issue 1 year ago • 6 comments

I've tried using the official pretrained CORD model, both v1 and v2. I set up a virtual environment with Python 3.7, and add the dependencies as specified in the Readme:

torch == 1.11.0+cu113 torchvision == 0.12.0+cu113 pytorch-lightning == 1.6.4 transformers == 4.11.3 timm == 0.5.4

As mentioned in another issue, I also tried upgrading to transformers==4.25.1, but I'm still getting: "Make sure _init_weights is implemented for <class 'donut.model.DonutModel'>"

This is both with naver-clova-ix/donut-base-finetuned-cord-v1 and naver-clova-ix/donut-base-finetuned-cord-v2.

To clarify: this happens after me trying to load the pretrained model with DonutModel.from_pretrained(PRETRAINED_MODEL_PATH) where PRETRAINED_MODEL_PATH points to a location where I git cloned naver-clova-ix/donut-base-finetuned-cord-v1 or v2 with git from huggingface.

csanadpoda avatar Apr 25 '23 18:04 csanadpoda

Hi @csanadpoda, I guess this issue might be caused by not cloning the official branch of the repo. Here's the link: https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1/tree/official Please let me know if you are still confused ;)

gwkrsrch avatar Apr 26 '23 01:04 gwkrsrch

Well yeah not having the main branch compatible with your software is a bit confusing haha

Works fine when cloning from official, thanks!

csanadpoda avatar Apr 27 '23 12:04 csanadpoda

I have similar issue, I am loading a trained model from a path using DonutModel.from_pretrained(saved_model_path)

trained on python3.7.12 donut-python = "1.0.9" torch = "1.12.1" # 1.13.1 brings fat nvidia libs torchvision = "0.13.1" transformers = "4.21.3"

warlord1710 avatar May 15 '23 18:05 warlord1710

pip uninstall timm pip install timm==0.5.4

zonasw avatar Jun 01 '23 10:06 zonasw

@gwkrsrch

I have already tried by cloning the official branch of the repo via huggingface.co but still getting Make sure _init_weights is implemented for <class 'donut.model.DonutModel'>"

image

asfansajid123 avatar Aug 30 '23 16:08 asfansajid123

Hello @gwkrsrch, @zonasw, @warlord1710, @csanadpoda,

I've encountered an issue with TorchServe when attempting to load a model that I trained using the donut-base checkpoint from Hugging Face.

Background:

I initially used the mentioned base checkpoint to train a model. However, I wasn't satisfied with the accuracy, so I proceeded to train the model again using additional datasets. Importantly, for this round of training, I used the checkpoints from the initial training as my starting point. After this training session, I achieved satisfactory accuracy and was able to perform inferences locally using the obtained checkpoints.

Issue:

When I tried to deploy this model using TorchServe, I created a .mar file and started the TorchServe server. However, I noticed that all my workers were stuck in an UNLOADING status. Checking the logs, I found an error that stated:

Failed to load model ocr_donut_model, exception Make sure '_init_weights' is implemented for <class 'donut.model.DonutModel'>

This is perplexing because I can load these model checkpoints locally without any issues. So, I'm not sure why TorchServe is having difficulties.

Additional Info:

I've made sure to include the install_py_dep_per_model setting in my config.properties.

Request:

I kindly request guidance on this issue. Why might there be a discrepancy between the local loading of checkpoints and the behavior in TorchServe?

config.properties

donut_model=1.0 inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 number_of_netty_threads=32 job_queue_size=1000 model_store=/home/model-server/model-store workflow_store=/home/model-server/wf-store install_py_dep_per_model=true min_worker=4 max_worker=10

Thank you for your assistance!

Codedrainer avatar Sep 27 '23 06:09 Codedrainer