Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Create homework-lab essays dataset

Open Vechtomov opened this issue 2 years ago • 2 comments

Create essays dataset from https://homework-lab.com/examples/

Vechtomov avatar Jan 21 '23 18:01 Vechtomov

Actually I did it already. Here is the result: https://huggingface.co/datasets/qwedsacf/homework-lab-essays But I only scraped the data without preprocessing. Essays were in .doc and .docx files so I extracted text via textract library. So there are a lot of spaces and tabulations in the texts.

Vechtomov avatar Jan 21 '23 18:01 Vechtomov

You are a hero :)

christophschuhmann avatar Jan 22 '23 09:01 christophschuhmann