Alpaca-CoT
Alpaca-CoT copied to clipboard
GPT-4 Instruction dataset
Take a look:
https://github.com/teknium1/GPTeacher
We will soon collect them and thank you for your support.
This one is a mixture of other datasets, but It should contain a few new records. It now landed on huggingface.
https://huggingface.co/datasets/swype/instruct
Thank you very much for your reminder. We 'll collect it soon.
Here is another one, alpaca but generated gpt-4. Includes Chinese translations :)
https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#fine-tuning-with-the-data
Related to your project, because you started out with chain-of-thoughts fine tuning:
Researchers alpaca finetuned Galactica, Galpaca, which seems to have better reasoning in science and technological domains than llama:
https://twitter.com/oijna/status/1637566839235518464
https://huggingface.co/GeorgiaTechResearchInstitute/galpaca-30b
I'll pay attention to these, thx.
This is so insanely fast moving, I get confused.
https://github.com/databrickslabs/dolly/tree/master/data
Author description (not mine): "CAMEL datasets:PhysicsChemistry and Biology. Each dataset contains 20K problem-solution pairs, consisting of 25 topics, 25 subtopics and 32 problems for each "topic, subtopic" pair generated and solved by GPT4"
https://github.com/lightaime/camel#data-hosted-on-hugging-face
https://github.com/DreamerGPT/DreamerGPT/tree/main/data