One
One
Have you tried Transformers or vLLM? PyTorch compatibility with CUDA 12.3 is experimental.
It's 3.5-0106. Because it's using the openchat 3.2's conversation template and Mistral base model.
`Correct` means verified correct answers. Besides, `GPT4` and `Human` were also used, indicating data with unknown correctness.
@bpucla 1. Yes, and `Human User` `Human Assistant` 2. Yes. GPT-3.5 data is discarded in the 3.5 version
Yes, it's deprecated now. Use `GPT4 Correct User` for best coding performance.
`max_new_tokens` limits the generated tokens of the model. If it outputs more than 300 tokens, the generation will stop. If you want shorter responses, you may prompt the model to...
Sorry for the inconvenience. We've updated it, should be published now.
When this parameter is enabled, losses are averaged on a per-sequence basis, otherwise on a per-token basis (same as HF trainer). It is disabled by default because it causes worse...
It may be a network problem, try downloading again? If it still fails, please paste the error message.
Thanks for your report! Yes, this is exactly the issue. Because SFT datasets are often small, and CPU RAM is abundant, we prefetch the dataset into memory. This implementation takes...