table-transformer
table-transformer copied to clipboard
How to load fine-tuned model into Hugging-Face Table-Transformer
Hi,
Recently I have tried fine-tuning the table transformer model with a small dataset. However, I was wondering if there is a way to load the model into hugging face's TableTransformerForObjectDetection.
When I try to do the same with a path to the .pth file -> It asks for a config. If I pass the config (structure_recognition.json), it drops a lot of the weights due to some issue with structure not being same.
Any help regarding this would be really nice!
Thanks, Prabhav
Hey @Prabhav55 ! Probably you should use the Hugging Face model for fine-tuning. There are some things different between the plain pytorch model here in this repo and the HF model.
As TATR is just a DETR, you can use the notebook for fine-tuning a DETR as a reference.
Hey @thiagodma , I too did notice some difference in both the models of hugging face and the original model. Did you manage to understand why this is so? Also were you able to fine tune the hugging face model, If yes then can you point out the changes to do in this notebook as mentioned in here.
The only difference is that the Table Transformer applies a "normalize before" operation, which means that layernorms are applied before, rather than after MLPs/attention.
Thanks
A comment here - I haven't digged deeper in the HF models innards, but I've gone through the full TATR repo and I can say there are significant steps that relate to how the images are processed I'm not sure are implemented in the HF model. This relates to the conversation about whether e.g. in structure recognition images should be tightly cropped around the bbox (more recent paper) or have some padding around it (older paper). The current weights are based on the older paper, so with more padding. The code in the repo assumes tighter padding (at least the image pre-processing scripts). The result is completely different training data, which may or may not impact results, but from experience I can tell that training this model from scratch will take a week, so you might want to take that into consideration before going with the HF model.
I'm not a CV expert (rather, a beginner) but I'd assume that with fine tuning the issue is the same when it comes to structure recognition especially. If the model has a million examples of padded images as training data and you fine-tune it with tightly cropped images, the model may not do what you want. I think I observed something like this a month ago, where I noticed that the structure recognition model was always assuming there is a padding around the table when doing inference, yielding bboxes that were smaller around the edges.
So I recommend to at least go through what happens to images before they are passed to training, from main.py and linked files. But as said, the weights do not match the code at the moment - code is for the recent paper (weights not released yet) and weights are for the old paper (different transformations to data).
@Ashwani-Dangwal yeah, I managed to fine-tune it. Basically all I did was:
Replace processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
for processor = DetrImageProcessor()
Replace:
DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50",
revision="no_timm",
num_labels=len(id2label),
ignore_mismatched_sizes=True)
for:
TableTransformerForObjectDetection.from_pretrained(
"microsoft/table-transformer-structure-recognition",
ignore_mismatched_sizes=True,
)
@thiagodma For the dataset preparation, I wonder on what dataset you fine-tuned it. If you did on FinTabNet, did you use the code here to canonize the cells and get the FinTabNet.c? And how did you manage if it's the case or not, if you used FinTabNet, to prepare the dataset to be used with the HF training.
@giuqoob I agree with you that training or fine-tuning takes days, I did some fine-tune on the FinTabNet using the code in this repo, I was able to see good improvements in the scores. Yet, I think there is some problem with the row detection. Were you able to notice the same issue? I wonder if it's due to bad training or just a limit to DETR.
@WalidHadri-Iron I created the FinTabNet(.a6) and pubtables datasets with the scripts provided and trained the model from scratch without a limit to batches per epoch, running it for 22 epochs. I haven't tested on my own data yet, but using the eval mode I got these results, which are worse than what authors report. I used the following params
-
batch_size
at 8 -
num_workers
at 8 -
pre_fetch_factor
at 2 -
pin_memory
True -
persist_workers
True -
non_blocking
True
I didn't touch any other settings, like learning rate.
What results did you get?
For model trained on 20 epochs
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.785
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.949
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.874
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.513
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.814
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.407
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.776
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.852
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.602
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.777
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.887
----------------------------------------------------------------------------------------------------
Results on simple tables (47784 total):
Accuracy_Con: 0.8406
GriTS_Top: 0.9838
GriTS_Con: 0.9805
GriTS_Loc: 0.9715
--------------------------------------------------
Results on complex tables (55339 total):
Accuracy_Con: 0.5572
GriTS_Top: 0.9598
GriTS_Con: 0.9586
GriTS_Loc: 0.9382
--------------------------------------------------
Results on all tables (103123 total):
Accuracy_Con: 0.6885
GriTS_Top: 0.9709
GriTS_Con: 0.9688
GriTS_Loc: 0.9536
--------------------------------------------------
Total time taken for 103123 samples: 13:09:16.739213
COCO metrics summary: AP50: 0.949, AP75: 0.874, AP: 0.785, AR: 0.852
@thiagodma For the dataset preparation, I wonder on what dataset you fine-tuned it. If you did on FinTabNet, did you use the code here to canonize the cells and get the FinTabNet.c? And how did you manage if it's the case or not, if you used FinTabNet, to prepare the dataset to be used with the HF training.
@WalidHadri-Iron I fine-tuned it using a proprietary dataset
@thiagodma Thanks for the help. Btw the format of the training data for hugging face model must be in pascal voc format or is it something else?
@Ashwani-Dangwal I'm using COCO format but I think this is something easy to change. I guess all you have to do is to change the pytorch Dataset definition
@WalidHadri-Iron Got the same type of error after fine tuning with fintabnet dataset in which the complete row was not detected properly. Did you get a work around this?
@thiagodma Thanks Man!
@giuqoob Did you fine-tune on the complete FinTabNet dataset using hugging face model as the base? Unfortunately I fine-tuned on the same using the main.py script and not the hugging face model.
If you did, would it be ok to share the FinTabNet weights? Would really appreciate that.
Thanks!
@giuqoob I just fine-tuned on FinTabNet, I kept the configuration the same for TSR, except the learning rates that I changed: from "lr":5e-5,"lr_backbone":1e-5
to "lr":1e-5, "lr_backbone":1e-6
. I fine-tuned for 15 epochs, since the first epoch we can see an important jump in the metrics, from epoch 10 to epoch 15, no big gain. I haven't had the time to run the eval script, but I will share the metrics for the best epoch when running on a sample of the test set during the training.
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.866
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.971
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.924
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.587
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.848
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.869
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.502
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.870
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.914
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.616
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.900
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.922
AP50:0.971, AP75:0.924, AP:0.866 AR:0.914
I will update with the full output of the eval code when I will have the time to run it. I am also thinking about changing the whole training setting to do better, this was just my first trial.
@Ashwani-Dangwal: As a workaround, I put a low threshold for rows and enhanced the post-processing based on the text position and some characteristics.
Hi,
Recently I have tried fine-tuning the table transformer model with a small dataset. However, I was wondering if there is a way to load the model into hugging face's TableTransformerForObjectDetection.
When I try to do the same with a path to the .pth file -> It asks for a config. If I pass the config (structure_recognition.json), it drops a lot of the weights due to some issue with structure not being same.
Any help regarding this would be really nice!
Thanks, Prabhav
To answer the initial question, check my comment here https://github.com/NielsRogge/Transformers-Tutorials/issues/316#issuecomment-1624001118
@WalidHadri-Iron @giuqoob Available FinTabNet Dataset here Can be processed to match the new paper using code from this repo, would it be hard to point to these folders?
Also I wanted to understand which old paper are you referring to that the model is trained on?
Has anyone managed to train the hugging face model in the fintabnet dataset?
@WalidHadri-Iron @giuqoob Available FinTabNet Dataset here Can be processed to match the new paper using code from this repo, would it be hard to point to these folders?
Also I wanted to understand which old paper are you referring to that the model is trained on?
@bely66 The scripts are in here https://github.com/microsoft/table-transformer/tree/main/scripts.
As @giuqoob pointed out before, if anyone is interested in using the code here to train/infer/process some data, I would also recommend spending some minimum time exploring the files in the repo.
@WalidHadri-Iron, have you tried training the hugging face model on the FinTabNet dataset?
@Ashwani-Dangwal I have tried training it on FinTabNet and it worked for me after changes to the model weights using the following script -> convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py, present is transformers repo.
@Prabhav55 , I also trained the original model on FinTabNet but it as not performing well. So i wanted to fine tune the hugging face model on FinTabNet.
@Ashwani-Dangwal I did exactly like @Prabhav55
If you're training on the original PubTables-1M and FinTabNet.c (FinTabNet.a6) together then one reason you may see lower numbers during evaluation is we changed how we evaluate on PubTables-1M in our most recent paper. To match these numbers you have to run https://github.com/microsoft/table-transformer/blob/42867c86768388ca4cafd546178abfb15c63aed3/scripts/create_padded_dataset.py on the validation and test splits to more tightly crop these images. The training data stays the same, so do not run the script on the training split. This step is not yet documented, sorry for that.
And just to add extra clarification, if you trained on PubTables-1M and FinTabNet.c using the current code without doing this step, there is no need to redo the training. You only need to do this cropping step for evaluation to match our reported numbers. We will update documentation for this soon.
Best, Brandon
@WalidHadri-Iron @giuqoob Available FinTabNet Dataset here Can be processed to match the new paper using code from this repo, would it be hard to point to these folders? Also I wanted to understand which old paper are you referring to that the model is trained on?
@bely66 The scripts are in here https://github.com/microsoft/table-transformer/tree/main/scripts.
As @giuqoob pointed out before, if anyone is interested in using the code here to train/infer/process some data, I would also recommend spending some minimum time exploring the files in the repo.
yep got your point, and went through the repo So basically the model is trained on old annotations and the code expects new annotations
so If i finetuned the model with the old annotations I will still be getting bad results because of the code?
@bsmock Would it be hard to confirm that, If I’m finetuning my data with the old annotations there’d be problems from the code?
Can you share the FinTabNet model here.
Hi,
See #158
Hi @Prabhav55 . Were you able to load the model after training it ? It is asking for config. WHen I'm giving it the config, then some params are dropping and I'm unable to load it. If you did it, can you please let me know how did u do ? The python file "convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py" is also not available now at the mentioned link If you have it, can you please share it with me? Thanks in advance @bsmock @NielsRogge