mistral-inference
mistral-inference copied to clipboard
Missing model card / data sheet with info on pretraining and RLHF datasets
At opening-up-chatgpt.github.io we're documenting data sources and degrees of openness along several dimensions for instruction-tuned LLMs. I am looking for information about (1) pretraining dataset and (2) RLHF datasets but have not found any details. The HuggingFace model card says
For full details of this model please read our release blog post
The release blog post provides no information on this at present.
Information on the the language composition of the pretraining dataset would also be welcome, as there are no mention on multilingual capabilities of the model in the linked blog post.
I would like to work on this project!
Upvote thread
FWIW Mistral currently sits in the bottom 5 of the live tracker of LLM openness: