guozhiyao
guozhiyao
损失值大小
请问下,经过训练的后的损失值大小大约是多少呢?我自己在从头pretrain 13B模型,不确定最终loss会收敛到多少,想参考下你的loss带下。
### Question I saw that you provided two pre-training data `CC-3M Concept-balanced 595K` and `LAION/CC/SBU BLIP-Caption Concept-balanced 558K`, what is the difference between these two data? Which are you using?...
请问下,有统计过预训练数据中,各种语言的占比吗?
### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 感谢如此出色的工作,想请教一下。除了模型结构上的改进以外,主要是什么方面的改进带来如此大的性能提升呢? ### Expected Behavior _No response_ ### Steps To Reproduce 无...
Hi, professor, I am so exciting about the result of you paper, and the idear inspire my inspiration a lot. I think it is an awesome work. But I still...
你好,请问下开源的3.5M和之前的2M数据是什么关系呢?3.5M是否包含了2M?还是互斥的?此外,是否包含了generated_chat_0.4M、school_math_0.25M或multiturn_chat_0.8M?
- base model: `alignment-handbook/zephyr-7b-sft-full` - train data: `UCLA-AGI/SPIN_iter0` I use the default hyper-parameter to train the model, and test the model with `HuggingFaceH4/open_llm_leaderboard` locally. The result on `allenai/ai2_arc` as below:...
I try to download from the url, such as `wget -P ./ -c https://data.together.xyz/redpajama-data-1T/v1.0.0/wikipedia/wiki.jsonl`. But it meets error as below ``` 63888150K .......... .......... .......... .......... .......... 54% 26.5M 82m10s...