cosmopedia
cosmopedia copied to clipboard

Published 20 hours ago •

→

Metadata

Reame
Issues

Results 13 cosmopedia issues

Sort by recently updated

Update search.py

fix saving (last chunk was skipped)

Educational scoring prompt?

2

comment

Is the prompt used for content educational scoring part of this repo? Did you use Mixtral to score/classify content or was dedicated classifier trained?

python deduplicate_dataset.py

1

comment

https://github.com/huggingface/cosmopedia/blob/main/deduplication/deduplicate_dataset.py ``` 2024-02-22 14:17:57.759 | INFO | datatrove.executor.slurm:launch_job:216 - Launching dependency job "mh3" 2024-02-22 14:17:57.759 | INFO | datatrove.executor.slurm:launch_job:216 - Launching dependency job "mh2" 2024-02-22 14:17:57.759 | INFO | datatrove.executor.slurm:launch_job:216...

Fantastic work! Is code data considered in Cosmopedia?

1

comment

Wow, this is super cool work, and thanks for open sourcing everything!! I wonder if cosmopedia tries incorporating code data as seeds to rephrase them into high-quality data? We did...

Training code for Cosmo-1B?

1

comment

Awesome work 🙂 Is there any plan to release the training code for cosmo-1b? Or at least details about what existing repos and framework tools were used?

when i run the scrip filter_and_classify_clusters.py , i can not reach the datasets

Couldn't reach 'HuggingFaceTB/web_clusters' on the Hub (ConnectionError) How can i solve this problam

questions about evaluation like MMLU

Thank you for sharing. Some common models like MMLU typically use a 5-shot setting to measure a model's in-context learning capabilities. Can you explain why MMLU evaluations use a zero-shot...

Educational scoring prompt for code

Can you share your prompt about code scoring data production? I want to make a c and c++ dataset for pre-training using this prompt. Of course, if you have already...

Number of shots during evaluation

Thanks for your great work! From https://github.com/huggingface/cosmopedia/tree/main/evaluation#benchmark-evaluation, is this the exact command you are using for evaluation? Because I found most of them are 0-shot which is inconsistent with the...

1
2
›

About

123

Stars

12

Forks

Watchers

Owner

← Metadata

123

Stars

12

Forks

Watchers

Owner

Metadata