tydiqa icon indicating copy to clipboard operation
tydiqa copied to clipboard

Scripts for parsing the Wikipedia articles?

Open crystina-z opened this issue 3 years ago • 0 comments

I'm looking at the process to prepare the passages from the raw Wikipedia dump (downloaded from the links in the repo), but unsure about how to determine the passage boundary. I wonder if the script is available anywhere if I didn't miss it?

And thanks for this great work!

crystina-z avatar May 27 '21 20:05 crystina-z