Vechtomov comments

Results 20 comments of


                                            Vechtomov

Create ivypanda essays dataset

Here is the result: https://huggingface.co/datasets/qwedsacf/ivypanda-essays I parsed the essay title and content, removed ad blocks, and then used [insciptis](https://pypi.org/project/inscriptis/) library to convert html to txt, so even tables and lists...

Poetry instructions from dataset

Hi, thanks. We don't need separation on train, test and validation. Can you combine all in one file?

Create homework-lab essays dataset

Actually I did it already. Here is the result: https://huggingface.co/datasets/qwedsacf/homework-lab-essays But I only scraped the data without preprocessing. Essays were in .doc and .docx files so I extracted text via...

Show the user's score and place in the leaderboard

Hi, I only proposed the feature. If you want to implement this you can ask to assign you on this issue.

dataset of 10k novel story descriptions submit @Vechtomov

Hi, thanks for contributing. Follow [this guide](https://github.com/LAION-AI/Open-Assistant/tree/main/openassistant/datasets) and make a pull request linked to this issue.

Create training a transformer-based model

The whole pull request looks like it was generated by a language model. @theblackcat102 I think we can close it.

Recipes dataset. Resolves #1031

Can you add a size label to the HF readme? Here is an example: https://huggingface.co/datasets/qwedsacf/competition_math/blob/main/README.md

Recipes dataset. Resolves #1031

Resolves #1031

Proposal: Use OA compatible jsonl message format for multi-turn conversations

Obviously jsonl is easier for storing and processing dialogs and especially multi-turn dialogs. I think we can use it for these types of datasets. /cc @christophschuhmann

Proposal: Use OA compatible jsonl message format for multi-turn conversations

I'll make a PR. But I found a little confusing behavior: when you upload a `jsonl` file via `Dataset("dataset.jsonl").push_to_hub(...)` it is converted into parquet. Also even if you upload the...