Geewook Kim comments

Results 22 comments of


                                            Geewook Kim

How much gpu memory size does single 1280,960 size photo need

Close this issue since it seems to be resolved :) Feel free to reopen this or open another issue if you have anything new for sharing or debugging.

Different input resolution throws error

Hi, this issue is related to the `window_size` of the image encoder (swin). For `donut-base`, set the size of each axis to a multiple of 320, e.g., [640, 640], [960,...

Neither of the DocVQA Task1 (Document VQA) demos work

Hi @David-McSharry, thank you for bringing this issue to our attention. We have received multiple reports regarding challenges in configuring the testing environment for `donut-python` due to recent updates in...

How to load checkpoint?

Hi @Vadkoz, The current trainer removes `state_dict` in the ckpt files (check https://github.com/clovaai/donut/blob/1.0.9/train.py#L29-L31). However, it saves the model weights in HF's transformers format (check https://github.com/clovaai/donut/blob/1.0.9/lightning_module.py#L146-L150). To load the model weights,...

How did training with a batch size of 8 fit onto a single A100?

Hi @csanadpoda , yes, we used fp16 ( https://github.com/clovaai/donut/blob/master/train.py#L127 ). Hope this helps ;)

Error: "Make sure `_init_weights` is implemented for <class 'donut.model.DonutModel'>"

Hi @csanadpoda, I guess this issue might be caused by not cloning the official branch of the repo. Here's the link: https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1/tree/official Please let me know if you are still...

It seems that load_dataset is very slow to load about 11M images, How did you solve it？

Hi @YuanEZhou , this might be helpful to you: - https://github.com/clovaai/donut/issues/23 Best.

ASCII only output during training

Hi, thank you for your interest on our work :) Let me get a quick/short answer first -> Yes, it would be possible by removing unnecessary tokens in the vocabulary...

ASCII only output during training

I think there are many options to implement this feature. First one is to remove unnecessary tokens in the vocabulary. For this, you should update vocabulary of the tokenizer and...

size mismatch for encoder.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).

Hi, thank you for bringing this issue to our attention. It appears that the problem is likely related to the environment configuration. We will resolve this issue, while also updating...