Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

Inferencing with fine tuned model with HuggingFace TableTransformerForObjectDetection

Open sanprit opened this issue 1 year ago • 24 comments

  • I have finetuned Microsoft Model for Table Structure detection using this script https://github.com/microsoft/table-transformer/tree/main#model-training
  • It has saved a .pth model object
  • I can infer the model object with https://github.com/microsoft/table-transformer/blob/main/src/inference.py
  • But I want to use your huggingface inferencing script for inference with the fine-tuned model.
  • But while loading the model with the script, I am getting a warning and it is not detecting the structure :
file_path = 'images/'+images_list[0]
image = Image.open(file_path).convert("RGB")

from transformers import TableTransformerModel, TableTransformerConfig
configuration = TableTransformerConfig('structure_config.json')

from transformers import DetrFeatureExtractor
feature_extractor = DetrFeatureExtractor(config=configuration)

encoding = feature_extractor(image, return_tensors="pt")
#model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")
model = TableTransformerForObjectDetection.from_pretrained("model_5.pth",config=configuration)
with torch.no_grad():
    outputs = model(**encoding)
target_sizes = [image.size[::-1]]
results = feature_extractor.post_process_object_detection(outputs, threshold=0.6, target_sizes=target_sizes)[0]
plot_results(image, results['scores'], results['labels'], results['boxes'])

Error: It is not detecting structure and throws a warning as well:


- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

sanprit avatar Jun 09 '23 05:06 sanprit

@sanprit Any luck in using the fine tuned model?

Ashwani-Dangwal avatar Jun 30 '23 09:06 Ashwani-Dangwal

@NielsRogge

I am facing this issue also. Cannot seem to load any model weights since the config weight names do not match the ones present in the original Table Transformer config. Is there a way to transform the weight names to one which matches the ones for TATR?

Regards, Prabhav

Prabhav55 avatar Jul 02 '23 18:07 Prabhav55

Can you please show the full warning? If you trained a TableTransformerForObjectDetection model, you should be able to load all weights when performing inference.

NielsRogge avatar Jul 03 '23 13:07 NielsRogge

Hey,

Just to clarify -

  • I trained the model using code provided by the official repository (https://github.com/microsoft/table-transformer/blob/main/src/main.py).
  • I used the config provided by the same repo (https://github.com/microsoft/table-transformer/blob/main/src/structure_config.json)
  • I then used the output -> model.pth and tried to load it into TableTransformerForObjectDetection using the below code:
configuration = TableTransformerConfig('structure_config.json')
feature_extractor = DetrImageProcessor(config=configuration)
model_structure = TableTransformerForObjectDetection.from_pretrained("/home/ubuntu/DEV/tatr-finetuning/fintabnet-process/FinTabNet.c_Image_Structure_PASCAL_VOC/output/20230701070646/model_10.pth",config=configuration)

On doing this, the error I get is:

Some weights of TableTransformerForObjectDetection were not initialized from the model checkpoint at /home/ubuntu/DEV/tatr-finetuning/fintabnet-process/FinTabNet.c_Image_Structure_PASCAL_VOC/output/20230701070646/model_10.pth and are newly initialized: ['decoder.layers.1.final_layer_norm.weight', 'backbone.conv_encoder.model.layer3.2.bn3.running_var', 'backbone.conv_encoder.model.layer2.0.bn1.bias', 'backbone.conv_encoder.model.layer3.0.conv1.weight', 'backbone.conv_encoder.model.layer4.1.bn3.running_mean', 'backbone.conv_encoder.model.layer2.3.bn3.running_mean', 'encoder.layers.5.fc2.weight', 'backbone.conv_encoder.model.layer1.2.bn2.bias' ........

I have truncated the output.

Regards, Prabhav Singh

Prabhav55 avatar Jul 03 '23 14:07 Prabhav55

Hi,

To convert checkpoints from the original repo to the HF format, I'd recommend using the conversion script: https://github.com/huggingface/transformers/blob/main/src/transformers/models/table_transformer/convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py.

So for that you need to git clone the Transformers library, and then run

python src/transformers/models/table_transformer/convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py

however you might need to tweak the script a bit to account for your new model

NielsRogge avatar Jul 03 '23 17:07 NielsRogge

Hi @NielsRogge , I was wondering whether you made some changes to the model while uploading it to hugging face as the results from using the hugging face model and the original model present in GitHub shows variation in table structure recognition on same images.

Ashwani-Dangwal avatar Jul 04 '23 06:07 Ashwani-Dangwal

@Prabhav55 , @sanprit Can you please share what learning rate you used while fine tuning and what were the AP after training? And also if you made more changes in any other hyper parameters.

Ashwani-Dangwal avatar Jul 04 '23 07:07 Ashwani-Dangwal

Hi,

I am attaching the training parameters I used & the command I used to train (All from table-transformers original repo):

{
    "lr":5e-5,
    "lr_backbone":1e-5,
    "batch_size":2,
    "weight_decay":1e-4,
    "epochs":20,
    "lr_drop":1,
    "lr_gamma":0.9,
    "clip_max_norm":0.1,
    
    "backbone":"resnet18",
    "num_classes":6,
    "dilation":false,
    "position_embedding":"sine",
    "emphasized_weights":{},
    
    "enc_layers":6,
    "dec_layers":6,
    "dim_feedforward":2048,
    "hidden_dim":256,
    "dropout":0.1,
    "nheads":8,
    "num_queries":125,
    "pre_norm":true,
    
    "masks":false,

    "aux_loss":false,
    
    "mask_loss_coef":1,
    "dice_loss_coef":1,
    "ce_loss_coef":1,
    "bbox_loss_coef":5,
    "giou_loss_coef":2,
    "eos_coef":0.4,
    
    "set_cost_class":1,
    "set_cost_bbox":5,
    "set_cost_giou":2,

    "device":"cuda",
    "seed":42,
    "start_epoch":0,
    "num_workers":1
}

Command:

python main.py --data_type structure --config_file structure_config.json --data_root_dir /path/to/structure_data

@NielsRogge I also tried the script you mentioned - convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py But for this script, the assertion on validation part are failing.

Regards, Prabhav

Prabhav55 avatar Jul 04 '23 08:07 Prabhav55

@Prabhav55 , Thanks for the reply. You tried to fine tune the model or trained from scratch?

Ashwani-Dangwal avatar Jul 04 '23 09:07 Ashwani-Dangwal

@Ashwani-Dangwal I am not very sure on this. From what I could understand from the main.py (https://github.com/microsoft/table-transformer/blob/main/src/main.py) loads the base model from DETR and then trains on it.

I assumed that if the model uploaded on HF is trained using same script, my weights should also be able to load on the same HF transformer.

Prabhav55 avatar Jul 04 '23 10:07 Prabhav55

@Prabhav55 What i was meaning to ask was whether the model you trained was fine tuned on the check point provided b the author (pubtables1m_structure_detr_r18.pth this check point) or did you train the model from scratch using your own dataset?

Ashwani-Dangwal avatar Jul 04 '23 10:07 Ashwani-Dangwal

Hi,

Note that the logits that are verified here are the ones of the pre-trained detection and table structure recognition checkpoints. You will have different logits if you trained the model yourself.

It's definitely good practice to verify that the original model and the HF model give the same results.

NielsRogge avatar Jul 05 '23 07:07 NielsRogge

@NielsRogge . Thanks for the reply, I did use both the original model and the one in the hugging face and the output of both are as follows - Output of the original model - image

Output of the hugging face model- image

Do you have nay idea why there is this much difference in recognizing the table structure on the same image?

Ashwani-Dangwal avatar Jul 05 '23 07:07 Ashwani-Dangwal

@Ashwani-Dangwal thanks a lot for visualizing, that seems like a bug. Are you seeing the same with the detection model?

Could you verify the logits of both the original model and the HF model on the same inputs? It could also be a difference in the postprocessing of the logits.

NielsRogge avatar Jul 05 '23 07:07 NielsRogge

@NielsRogge How do i print the logits of the hugging face model? Also the post processing steps are same for inferencing for both the models, which are taken from the repo of Brandon Smock(author of the original model). Mainly the functions used are 'objects_to_structures', 'structure_to_cells', and 'cells_to_csv', these three functions sum up all the post processing steps in the file postprocess.py in the original model repo. Also why is that if I comment out the max size parameter in the original code for inferencing with the original model - image

Then the detection model is not able to detect the table with accuracy. For example - Table detected region when the max size parameter is commented out - image Table detected region when max_size parameter is stated- image

However if in the hugging face model if i use the following line - feature_extractor = DetrFeatureExtractor(do_resize=True, max_size=800) or even if i dont use the parametrs and just write feature_extractor = DetrFeatureExtractor() Then still i have the same result which is as follows- image

Ashwani-Dangwal avatar Jul 05 '23 07:07 Ashwani-Dangwal

@Ashwani-Dangwal To get the logits using HF model, it is in the output of the model, you can have it using

model(**encoding).logits

Could this problem be coming from the pre-processing of the image? @NielsRogge

WalidHadri-Iron avatar Jul 05 '23 08:07 WalidHadri-Iron

@Ashwani-Dangwal To get the logits using HF model, it is in the output of the model, you can have it using

model(**encoding).logits

Thankyou

Ashwani-Dangwal avatar Jul 05 '23 09:07 Ashwani-Dangwal

@Ashwani-Dangwal thanks for providing but there's no need to pollute the thread with all values, just posting the first 3 of both the original and HF logits suffice. Also make sure that the inputs were prepared in the same way to obtain those logits.

NielsRogge avatar Jul 05 '23 09:07 NielsRogge

@NielsRogge , Sorry about that, deleted that post. Here is the logits of the hugging face model -

tensor([[[-1.1852e+01, -5.1195e+00, 8.9091e+00, -7.7407e+00, -4.9734e+00, -3.5293e+00, 1.1821e+00], [-1.0989e+01, -6.1581e+00, -2.7933e+00, -3.7990e+00, -6.2274e+00, -5.7886e+00, 3.3452e+00], [-2.7404e+01, -8.7458e+00, -7.2998e+00, -1.3358e+01, -1.1446e+01, -9.4610e-01, 4.0263e+00]

Here are the logits of original model -

tensor([[[-1.3317e+01, -6.4428e+00, 7.6415e+00, -8.4886e+00, -5.4992e+00, -3.6403e+00, 1.7129e+00], [-1.4038e+01, -7.8999e+00, -1.3723e+00, -4.5212e+00, -5.2498e+00, -5.5131e+00, 3.1864e+00], [-2.1346e+01, -9.2924e+00, -4.1696e+00, -1.0014e+01, -5.8311e+00, -1.5367e+00, 2.4998e+00]

I can confirm that input were prepared in the same way with same amount of padding added after detecting the table and all the parameters like max_resize and everything are same.

Ashwani-Dangwal avatar Jul 05 '23 09:07 Ashwani-Dangwal

@Ashwani-Dangwal could you share the code snippets used to generate the above visualizations (perhaps as a Github gist)?

NielsRogge avatar Jul 06 '23 06:07 NielsRogge

Hi,

I am attaching the training parameters I used & the command I used to train (All from table-transformers original repo):

{
    "lr":5e-5,
    "lr_backbone":1e-5,
    "batch_size":2,
    "weight_decay":1e-4,
    "epochs":20,
    "lr_drop":1,
    "lr_gamma":0.9,
    "clip_max_norm":0.1,
    
    "backbone":"resnet18",
    "num_classes":6,
    "dilation":false,
    "position_embedding":"sine",
    "emphasized_weights":{},
    
    "enc_layers":6,
    "dec_layers":6,
    "dim_feedforward":2048,
    "hidden_dim":256,
    "dropout":0.1,
    "nheads":8,
    "num_queries":125,
    "pre_norm":true,
    
    "masks":false,

    "aux_loss":false,
    
    "mask_loss_coef":1,
    "dice_loss_coef":1,
    "ce_loss_coef":1,
    "bbox_loss_coef":5,
    "giou_loss_coef":2,
    "eos_coef":0.4,
    
    "set_cost_class":1,
    "set_cost_bbox":5,
    "set_cost_giou":2,

    "device":"cuda",
    "seed":42,
    "start_epoch":0,
    "num_workers":1
}

Command:

python main.py --data_type structure --config_file structure_config.json --data_root_dir /path/to/structure_data

@NielsRogge I also tried the script you mentioned - convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py But for this script, the assertion on validation part are failing.

Regards, Prabhav

I just used the conversion code, it's working fine. Of course if you keep the two next assertions and the weights are not the same, they are going to fail.

assert torch.allclose(outputs.logits[0, :3, :3], expected_logits, atol=1e-4)

assert torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4)

WalidHadri-Iron avatar Jul 06 '23 16:07 WalidHadri-Iron

Hi, I am attaching the training parameters I used & the command I used to train (All from table-transformers original repo):

{
    "lr":5e-5,
    "lr_backbone":1e-5,
    "batch_size":2,
    "weight_decay":1e-4,
    "epochs":20,
    "lr_drop":1,
    "lr_gamma":0.9,
    "clip_max_norm":0.1,
    
    "backbone":"resnet18",
    "num_classes":6,
    "dilation":false,
    "position_embedding":"sine",
    "emphasized_weights":{},
    
    "enc_layers":6,
    "dec_layers":6,
    "dim_feedforward":2048,
    "hidden_dim":256,
    "dropout":0.1,
    "nheads":8,
    "num_queries":125,
    "pre_norm":true,
    
    "masks":false,

    "aux_loss":false,
    
    "mask_loss_coef":1,
    "dice_loss_coef":1,
    "ce_loss_coef":1,
    "bbox_loss_coef":5,
    "giou_loss_coef":2,
    "eos_coef":0.4,
    
    "set_cost_class":1,
    "set_cost_bbox":5,
    "set_cost_giou":2,

    "device":"cuda",
    "seed":42,
    "start_epoch":0,
    "num_workers":1
}

Command:

python main.py --data_type structure --config_file structure_config.json --data_root_dir /path/to/structure_data

@NielsRogge I also tried the script you mentioned - convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py But for this script, the assertion on validation part are failing. Regards, Prabhav

I just used the conversion code, it's working fine. Of course if you keep the two next assertions and the weights are not the same, they are going to fail.

assert torch.allclose(outputs.logits[0, :3, :3], expected_logits, atol=1e-4)

assert torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4)

Thanks a lot! I figured the same. Are you able to infer with the hugging face inference method by loading the state dict? I was still getting a similar error after that.

Prabhav55 avatar Jul 06 '23 17:07 Prabhav55

@Prabhav55 If you did the conversion, I loaded the model using

TableTransformerForObjectDetection.from_pretrained(model_folder_path)

Where basically the model_folder_path is the path to the folder where you put the three files you got from the conversion.

WalidHadri-Iron avatar Jul 07 '23 11:07 WalidHadri-Iron

@NielsRogge I have added you as a collaborator you can check out the code for inference and visualization. Thankyou.

Ashwani-Dangwal avatar Jul 10 '23 06:07 Ashwani-Dangwal