Richard Nagyfi issues

Results 14 issues of


                                            Richard Nagyfi

Chapter 03 - central limit theorem

First of all, thank you very much for this learning material, I really wish there were similar tutorials for other engineering areas! I think the central limit theorem is slightly...

WikiData Crawler Notebook

A notebook that generates Q&A pairs automatically based on the contents of the WikiData knowledge graph to accelerate prompt generation. Added README.md with step-by-step instructions as well as the Jupyter...

Download Ebooks from Project Gutenberg

https://www.gutenberg.org/ has an extensive collection of ebooks in multiple languages and formats that would make great trianing data

data

Add Hungarian literature dataset

Scrape MEK OSZK (Hungarian Electronical Library) for books and upload the data to HF.

data

Copy dataset of factoid q/a pairs with difficulty ratings from Wikipedia articles

Copy the "Manually-generated factoid question/answer pairs with difficulty ratings from Wikipedia articles. Dataset includes articles, questions, and answers." dataset to HF. > Please cite this paper if you write any...

data

Copy the Ubuntu Dialogue Corpus to HF

Copy the Ubuntu dialogue corpus to HF https://www.kaggle.com/datasets/rtatman/ubuntu-dialogue-corpus See if it can be further cleaned (some answers are low quality)

data

Add multilangual subtitles datasets

1) copy OpenSubtitles dataset to HF https://opus.nlpl.eu/OpenSubtitles-v2018.php 2) optionall scrape more subtitles from different places as long as they are multilangual and their timestamps can be matched with other languages

data

Show relative growth % on stats page (gamification)

I think this can boost the competitiveness of smaller / new languages. Even though they have no chance of ever beating the larger ones in number of new messages, seeing...

feature

Reupload the Gutenberg dataset to match the new data format

- Reupload the data to HF - move all metadata columns to JSON meta - move the gutenberg crawler to datasets/ - update its loader / init scripts - updated...

data

Gutenberg fix

- updated dataset to match the new schema for both english and multilangual Project Gutenberg eBooks - added link to HF text datasets to __init__.py - moved Gutenberg Crawler from...