FUDGE icon indicating copy to clipboard operation
FUDGE copied to clipboard

Linking for distant words

Open harshsummit opened this issue 1 year ago • 6 comments

Hey @herobd , I was trying to extract the relations for keywords using the pretrained weights and your code.

But for distant boxes it doesnt seem to identify the linking, is there anyway to solve this? ec31a33a-7c6a-4dda-b0ab-e86c5f7df390

harshsummit avatar Mar 21 '23 11:03 harshsummit

It could either be an artifact of the pretraining data (not having long relationships) or the Swin model having windowed attention. Have you tried fine-tuning on your data?

herobd avatar Mar 21 '23 16:03 herobd

I couldn't figure out how should the structure of my dataset look like, can you please help me with that? @herobd

harshsummit avatar Mar 22 '23 06:03 harshsummit

Sorry, ignore my prior response about the Swin model attention (I thought this was an issue on Dessurt). The graph should have links that far across, but it's failing to merge the two parts of the key together (e.g. "9 Add lines..." and "9"). Fine tuning is probably a good thing to try still.

You have a few choices with the data:

  1. Make it look like the NAF or FUNDS data and use one of those dataset loaders
  2. Write your own child class of datasets/graph_pair.py. This is mostly writing the parseAnn function.

Do you have annotations for your data?

herobd avatar Mar 22 '23 16:03 herobd

Yea I have the annotations for my dataset, but Im unable to finetune it for my dataset.

I even tried to fine tune it for FUNSD dataset by extracting the FUNSD.zip dataset provided in the Readme link, and place it inside “data” folder.

But it throws me an error that value of num_classes should be more than 0 currently is 0

harshsummit avatar Mar 23 '23 02:03 harshsummit

Is the dataset structure to be modified before using it for training? or we can use the FUNSD directly to train it using train.py

harshsummit avatar Mar 23 '23 02:03 harshsummit

What is the config your using? And what is the exact error? (line number)

herobd avatar Mar 23 '23 16:03 herobd