GraphGen icon indicating copy to clipboard operation
GraphGen copied to clipboard

Release data generation scripts of GraphGen, for generating QA datasets on Hugging Face

Open NielsRogge opened this issue 7 months ago • 4 comments

Hi @ChenZiHong-Gavin 🤗

I'm Niels and work as part of the open-source team at Hugging Face. I discovered your work on Arxiv and was wondering whether you would like to submit it to hf.co/papers to improve its discoverability. If you are one of the authors, you can submit it at https://huggingface.co/papers/submit.

The paper page lets people discuss about your paper and lets them find artifacts about it (your data generation scripts for creating datasets for QA tasks, for instance), you can also claim the paper as yours which will show up on your public profile at HF, add Github and project page URLs.

It'd be awesome to also release the data generation scripts/example scripts to make it easier for researchers to generate training data for their models with it. This would allow people to load your framework directly from 🤗 Datasets, so that people can do:

from datasets import load_dataset

dataset = load_dataset("your-hf-org-or-username/your-generation-script")

See here for a guide: https://huggingface.co/docs/datasets/loading.

Besides that, there's the dataset viewer which allows people to quickly explore the first few rows of the data in the browser.

Let me know if you're interested/need any help regarding this!

Cheers,

Niels ML Engineer @ HF 🤗

NielsRogge avatar May 28 '25 10:05 NielsRogge

Hi @ChenZiHong-Gavin 🤗

I'm Niels and work as part of the open-source team at Hugging Face. I discovered your work on Arxiv and was wondering whether you would like to submit it to hf.co/papers to improve its discoverability. If you are one of the authors, you can submit it at https://huggingface.co/papers/submit.

The paper page lets people discuss about your paper and lets them find artifacts about it (your data generation scripts for creating datasets for QA tasks, for instance), you can also claim the paper as yours which will show up on your public profile at HF, add Github and project page URLs.

It'd be awesome to also release the data generation scripts/example scripts to make it easier for researchers to generate training data for their models with it. This would allow people to load your framework directly from 🤗 Datasets, so that people can do:

from datasets import load_dataset

dataset = load_dataset("your-hf-org-or-username/your-generation-script") See here for a guide: https://huggingface.co/docs/datasets/loading.

Besides that, there's the dataset viewer which allows people to quickly explore the first few rows of the data in the browser.

Let me know if you're interested/need any help regarding this!

Cheers,

Niels ML Engineer @ HF 🤗

Hi @NielsRogge I'm one of the authors but I failed to submit it at https://huggingface.co/papers/submit as it says:

You can't submit a paper. Only authors with at least one paper on HF can submit to the Daily Paper. Check out how to claim authorship of a paper.

Would you please provide some help? Thanks.

ChenZiHong-Gavin avatar May 29 '25 03:05 ChenZiHong-Gavin

Hi,

Sure I've indexed your first paper on HF here: https://huggingface.co/papers/2505.20416. Feel free to claim it with your HF account, add the Github URL and link the artifacts.

NielsRogge avatar May 29 '25 08:05 NielsRogge

@NielsRogge

https://github.com/tpoisonooo/ROGRAG and https://github.com/open-sciencelab/SeedBench are also our work (all of them are ACL25), can we submit to hc.co/papers ?

tpoisonooo avatar May 29 '25 10:05 tpoisonooo

Yes, here they are:

  • https://huggingface.co/papers/2503.06474
  • https://huggingface.co/papers/2505.13220.

Feel free to claim them with your HF account and add the Github and/or project page URLs. The latter has a nice dataset which could be made accessible on the hub, so that people can do:

from datasets import load_dataset

dataset = load_dataset("your-hf-org/seedbench-corpus")

NielsRogge avatar May 29 '25 11:05 NielsRogge