LiLT icon indicating copy to clipboard operation
LiLT copied to clipboard

Fine-tuning on custom data

Open siamakzd opened this issue 2 years ago • 3 comments

Thank you for sharing your great work!

If I want to fine-tune on a custom dataset, what should be the steps? i.e.

  • What is input data format for training, testing and inference?

-Which scripts we need to modify?

Thanks in advance!

siamakzd avatar Mar 30 '22 03:03 siamakzd

Hi, I think the main steps should be:

  • Organize your dataset into the format of FUNSD/XFUND, depending on your dataset is monolingual/multilingual.
  • Put YourDataset.py under LiLTfinetune/data/datasets/. You can refer to funsd.py/xfun.py.
  • Put run_YourDataset_YourTask.py under examples/. You can refer to run_funsd.py/run_xfun_re.py/run_xfun_ser.py.

If you want to do something beyond training/evaluating, You can add your code to the lines after the model makes predictions, such as https://github.com/jpWang/LiLT/blob/main/examples/run_funsd.py#L345 in run_funsd.py.

jpWang avatar Mar 30 '22 07:03 jpWang

Hi,

See also my demo notebooks here: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LiLT

NielsRogge avatar Nov 21 '22 18:11 NielsRogge

Hello,

Could you let me know when you have a Custom dataset and how to organize your dataset into the format of FUNSD/XFUND?

and do you recommend any tutorial for this step?

Thank you in advance.

hamzabchiri avatar May 14 '23 22:05 hamzabchiri