donut How many minimum images required for training

Hello @SamSamhuns, @gwkrsrch, @VictorAtPL I have around 60 Images and custom 8 tokens, each image consist of 3-4 same key but different values and annotation format is like SROIE I have followed this link Converted my data to this structure and followed the converter script as mention in the blog

{
[{
    "Name": "Tom",
    "Buyer": "Conda",
    "contact_number": "989898989898",
    "alt_number": "55555555",
    "Buyer_id": "9856321023"
},

{
    "Name": "Hanks",
    "Buyer": "Conda",
    "contact_number": "99999999999",
    "alt_number": "25823102",
    "Buyer_id": "9856321024"
},

{
    "Name": "Lita",
    "Buyer": "Conda",
    "contact_number": "4545858402",
    "alt_number": "12121212121",
    "Buyer_id": "9856321022"
}]
}

My metadata.jsonl

{"file_name": "1.png", "ground_truth": "{\"gt_parse\": [{\"Name\": \"Tom\", \"Buyer\": \"Conda\", \"contact_number\": \"989898989898\", \"alt_number\": \"55555555\", \"Buyer_id\": \"9856321023\"}, {\"Name\": \"Hanks\", \"Buyer\": \"Conda\", \"contact_number\": \"99999999999\", \"alt_number\": \"25823102\", \"Buyer_id\": \"9856321024\"}, {\"Name\": \"Lita\", \"Buyer\": \"Conda\", \"contact_number\": \"4545858402\", \"alt_number\": \"12121212121\", \"Buyer_id\": \"9856321022\"}]}"}

this is my config my images size is variable max (2205 X 1693) min (1755 X 779)

resume_from_checkpoint_path: null # only used for resume_from_checkpoint option in PL
result_path: "/content/drive/MyDrive/results"
pretrained_model_name_or_path: "naver-clova-ix/donut-base" # loading a pre-trained model (from moldehub or path)
dataset_name_or_paths: ["/content/drive/MyDrive/my_VDU"] # loading datasets (from moldehub or path)
sort_json_key: False # cord dataset is preprocessed, and publicly available at 
train_batch_sizes: [1]
val_batch_sizes: [1]
input_size: [1280, 960] # when the input resolution differs from the pre-training setting, some weights will be newly initialized (but the model training would be okay)
max_length: 768
align_long_axis: False
num_nodes: 1
seed: 2022
lr: 3e-5
warmup_steps: 300 # 800/8*30/10, 10%
num_training_samples_per_epoch: 800
max_epochs: 80
max_steps: -1
num_workers: 8
val_check_interval: 1.0
check_val_every_n_epoch: 10
gradient_clip_val: 1.0
verbose: True

I have trained using this configuration to epoch's 300 ,200, 120,80 ,40, 20, but all the results were miss spell, number were wrong. don't know if i am doing something wrong or should I do some tweaks,or increase my training data I even tried to combine the synthdog 200 images data but no luck still results were miss spell

Sep 10 '22 23:09 qustions

60 might be too small of a sample. I trained with around ~1200 images and it was better. If you have fewer images, train with fewer epochs and use a smaller learning rate than 3e-5. Your warmup_steps should be higher around 800 since they recommend 10% of the total steps.

Sep 12 '22 05:09 SamSamhuns

@SamSamhuns thanks for the feedback but is my annotation format correct i was getting error on utils assert "gt_parse" in ground_truth and isinstance(ground_truth["gt_parse"], dict)

Traceback (most recent call last):
  File "train.py", line 152, in <module>
    train(config)
  File "train.py", line 88, in train
    sort_json_key=config.sort_json_key,
  File "/content/donut/donut/util.py", line 76, in __init__
    assert "gt_parse" in ground_truth and isinstance(ground_truth["gt_parse"], dict)
AssertionError

Sep 12 '22 15:09 qustions

This part would be helpful to you: https://github.com/clovaai/donut#data If you have more than one ground truth per an image, please use gt_parses not gt_parse. Hope this helps :)

Nov 11 '22 06:11 gwkrsrch

@SamSamhuns is there any explanation for warmup_steps formula? I have 4790 data to be trained

Dec 10 '23 02:12 ariefwijaya

donut donut copied to clipboard

How many minimum images required for training

donut
donut copied to clipboard