ReFinED icon indicating copy to clipboard operation
ReFinED copied to clipboard

Some questions about training dataset

Open MikeDean2367 opened this issue 4 months ago • 0 comments

Great work!

I executed the following command and obtained the data file named wikipedia_links_aligned_spans.json in the folder ~/.cache/refined/datasets.

python3 src/refined/training/train/train.py --experiment_name test

I have two questions regarding this file:

  • Is wikipedia_links_aligned_spans.json the training data?
  • If so, which fields are used for training? I found three fields in the wikipedia_links_aligned_spans.json, which are hyperlinks_clean, hyperlinks, and predicted_spans. I'm not familiar with this three fields and I'm unsure how to proceed with obtaining the training data.

Thanks !

MikeDean2367 avatar May 01 '24 11:05 MikeDean2367