Alpaca-CoT GPT-4 Instruction dataset

GPT-4 Instruction dataset

Open KnutJaegersberg opened this issue 1 year ago • 9 comments

Take a look:

https://github.com/teknium1/GPTeacher

Apr 02 '23 04:04 KnutJaegersberg

We will soon collect them and thank you for your support.

Apr 02 '23 14:04 PhoebusSi

This one is a mixture of other datasets, but It should contain a few new records. It now landed on huggingface.

https://huggingface.co/datasets/swype/instruct

Apr 06 '23 11:04 KnutJaegersberg

Thank you very much for your reminder. We 'll collect it soon.

Apr 06 '23 12:04 PhoebusSi

Here is another one, alpaca but generated gpt-4. Includes Chinese translations :)

https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#fine-tuning-with-the-data

Apr 07 '23 10:04 KnutJaegersberg

Related to your project, because you started out with chain-of-thoughts fine tuning:

Researchers alpaca finetuned Galactica, Galpaca, which seems to have better reasoning in science and technological domains than llama:

https://twitter.com/oijna/status/1637566839235518464

https://huggingface.co/GeorgiaTechResearchInstitute/galpaca-30b

Apr 09 '23 09:04 KnutJaegersberg

I'll pay attention to these, thx.

Apr 11 '23 03:04 dkqkxx

This is so insanely fast moving, I get confused.

https://github.com/databrickslabs/dolly/tree/master/data

Apr 12 '23 18:04 KnutJaegersberg

Author description (not mine): "CAMEL datasets:PhysicsChemistry and Biology. Each dataset contains 20K problem-solution pairs, consisting of 25 topics, 25 subtopics and 32 problems for each "topic, subtopic" pair generated and solved by GPT4"

https://github.com/lightaime/camel#data-hosted-on-hugging-face

Apr 16 '23 17:04 KnutJaegersberg

https://github.com/DreamerGPT/DreamerGPT/tree/main/data

Apr 24 '23 06:04 KnutJaegersberg

Alpaca-CoT Alpaca-CoT copied to clipboard

GPT-4 Instruction dataset

Alpaca-CoT
Alpaca-CoT copied to clipboard