donut
donut copied to clipboard
Error: "Make sure `_init_weights` is implemented for <class 'donut.model.DonutModel'>"
I've tried using the official pretrained CORD model, both v1 and v2. I set up a virtual environment with Python 3.7, and add the dependencies as specified in the Readme:
torch == 1.11.0+cu113 torchvision == 0.12.0+cu113 pytorch-lightning == 1.6.4 transformers == 4.11.3 timm == 0.5.4
As mentioned in another issue, I also tried upgrading to transformers==4.25.1, but I'm still getting: "Make sure _init_weights
is implemented for <class 'donut.model.DonutModel'>"
This is both with naver-clova-ix/donut-base-finetuned-cord-v1 and naver-clova-ix/donut-base-finetuned-cord-v2.
To clarify: this happens after me trying to load the pretrained model with DonutModel.from_pretrained(PRETRAINED_MODEL_PATH)
where PRETRAINED_MODEL_PATH
points to a location where I git cloned naver-clova-ix/donut-base-finetuned-cord-v1 or v2 with git from huggingface.
Hi @csanadpoda, I guess this issue might be caused by not cloning the official branch of the repo. Here's the link: https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1/tree/official Please let me know if you are still confused ;)
Well yeah not having the main branch compatible with your software is a bit confusing haha
Works fine when cloning from official, thanks!
I have similar issue, I am loading a trained model from a path using DonutModel.from_pretrained(saved_model_path)
trained on python3.7.12 donut-python = "1.0.9" torch = "1.12.1" # 1.13.1 brings fat nvidia libs torchvision = "0.13.1" transformers = "4.21.3"
pip uninstall timm pip install timm==0.5.4
@gwkrsrch
I have already tried by cloning the official branch of the repo via huggingface.co but still getting Make sure _init_weights is implemented for <class 'donut.model.DonutModel'>"
Hello @gwkrsrch, @zonasw, @warlord1710, @csanadpoda,
I've encountered an issue with TorchServe when attempting to load a model that I trained using the donut-base checkpoint from Hugging Face.
Background:
I initially used the mentioned base checkpoint to train a model. However, I wasn't satisfied with the accuracy, so I proceeded to train the model again using additional datasets. Importantly, for this round of training, I used the checkpoints from the initial training as my starting point. After this training session, I achieved satisfactory accuracy and was able to perform inferences locally using the obtained checkpoints.
Issue:
When I tried to deploy this model using TorchServe, I created a .mar
file and started the TorchServe server. However, I noticed that all my workers were stuck in an UNLOADING
status. Checking the logs, I found an error that stated:
Failed to load model ocr_donut_model, exception Make sure '_init_weights' is implemented for <class 'donut.model.DonutModel'>
This is perplexing because I can load these model checkpoints locally without any issues. So, I'm not sure why TorchServe is having difficulties.
Additional Info:
I've made sure to include the install_py_dep_per_model
setting in my config.properties
.
Request:
I kindly request guidance on this issue. Why might there be a discrepancy between the local loading of checkpoints and the behavior in TorchServe?
config.properties
donut_model=1.0 inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 number_of_netty_threads=32 job_queue_size=1000 model_store=/home/model-server/model-store workflow_store=/home/model-server/wf-store install_py_dep_per_model=true min_worker=4 max_worker=10
Thank you for your assistance!