Release data generation scripts of GraphGen, for generating QA datasets on Hugging Face
Hi @ChenZiHong-Gavin 🤗
I'm Niels and work as part of the open-source team at Hugging Face. I discovered your work on Arxiv and was wondering whether you would like to submit it to hf.co/papers to improve its discoverability. If you are one of the authors, you can submit it at https://huggingface.co/papers/submit.
The paper page lets people discuss about your paper and lets them find artifacts about it (your data generation scripts for creating datasets for QA tasks, for instance), you can also claim the paper as yours which will show up on your public profile at HF, add Github and project page URLs.
It'd be awesome to also release the data generation scripts/example scripts to make it easier for researchers to generate training data for their models with it. This would allow people to load your framework directly from 🤗 Datasets, so that people can do:
from datasets import load_dataset
dataset = load_dataset("your-hf-org-or-username/your-generation-script")
See here for a guide: https://huggingface.co/docs/datasets/loading.
Besides that, there's the dataset viewer which allows people to quickly explore the first few rows of the data in the browser.
Let me know if you're interested/need any help regarding this!
Cheers,
Niels ML Engineer @ HF 🤗
Hi @ChenZiHong-Gavin 🤗
I'm Niels and work as part of the open-source team at Hugging Face. I discovered your work on Arxiv and was wondering whether you would like to submit it to hf.co/papers to improve its discoverability. If you are one of the authors, you can submit it at https://huggingface.co/papers/submit.
The paper page lets people discuss about your paper and lets them find artifacts about it (your data generation scripts for creating datasets for QA tasks, for instance), you can also claim the paper as yours which will show up on your public profile at HF, add Github and project page URLs.
It'd be awesome to also release the data generation scripts/example scripts to make it easier for researchers to generate training data for their models with it. This would allow people to load your framework directly from 🤗 Datasets, so that people can do:
from datasets import load_dataset
dataset = load_dataset("your-hf-org-or-username/your-generation-script") See here for a guide: https://huggingface.co/docs/datasets/loading.
Besides that, there's the dataset viewer which allows people to quickly explore the first few rows of the data in the browser.
Let me know if you're interested/need any help regarding this!
Cheers,
Niels ML Engineer @ HF 🤗
Hi @NielsRogge I'm one of the authors but I failed to submit it at https://huggingface.co/papers/submit as it says:
You can't submit a paper. Only authors with at least one paper on HF can submit to the Daily Paper. Check out how to claim authorship of a paper.
Would you please provide some help? Thanks.
Hi,
Sure I've indexed your first paper on HF here: https://huggingface.co/papers/2505.20416. Feel free to claim it with your HF account, add the Github URL and link the artifacts.
@NielsRogge
https://github.com/tpoisonooo/ROGRAG and https://github.com/open-sciencelab/SeedBench are also our work (all of them are ACL25), can we submit to hc.co/papers ?
Yes, here they are:
- https://huggingface.co/papers/2503.06474
- https://huggingface.co/papers/2505.13220.
Feel free to claim them with your HF account and add the Github and/or project page URLs. The latter has a nice dataset which could be made accessible on the hub, so that people can do:
from datasets import load_dataset
dataset = load_dataset("your-hf-org/seedbench-corpus")