donut icon indicating copy to clipboard operation
donut copied to clipboard

DocVQA input input_ids at training time

Open cccccckt opened this issue 5 months ago • 1 comments

I don't know if there was a problem with the data processing or the metadata.jsonl file was created incorrectly. I found that the input_ids input to the donut model contained the answer part. Is this normal? You can see the following input_ids:

tensor([[57527, 57529, 11604, 52743, 48941, 45383, 18528, 43095, 36477, 46385, 35647, 36209, 57524, 57526, 46481, 23485, 35815, 4768, 57523, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:2')

I used tokenizer to decode it and got the result: <s_docvqa> <s_question> ▁When ▁is ▁the ▁response ▁code ▁request ▁form ▁dat ed ? </s_question> <s_answer> ▁September ▁10 , ▁1996 </s_answer> </s> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>

From my experience or based on the code provided by the author, prompt should be like: <s_docvqa> <s_question><question></s_question> <s_answer>

Part of my metadata.jsonl file is as follows {"file_name": "sxxj0037_2.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"How many points are there in modifications to readout instrumentation\", \"answer\": \"5.\"}]}"} {"file_name": "tynx0037_1.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"What is the first line of the address mentioned at the top?\", \"answer\": \"Reynolds Building\"}, {\"question\": \"What is the date mentioned?\", \"answer\": \"May 4, 2000\"}]}"} {"file_name": "mtyj0226_1.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"What is the word written in bold black in the first picture?\", \"answer\": \"Coke\"}]}"}

cccccckt avatar Sep 02 '24 14:09 cccccckt