axolotl
axolotl copied to clipboard
Kindly add DeepSeek family for training
⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
- [X] I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
Axolotl for DeepSeek model finetuning. Thanks
✔️ Solution
Sft, Lora and qLora will suffice. It will be great to have their models in Axolotl training platform.
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this feature has not been requested yet.
- [X] I have provided enough information for the maintainers to understand and evaluate this request.
Hey, as said here https://github.com/OpenAccess-AI-Collective/axolotl/discussions/1171, it's llama-based, so you can use the llama configs :)
Very cool, Thank you!
It doesn't look so:
https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json
{
--
| "architectures": [
| "DeepseekForCausalLM"
| ],
| "attention_bias": false,
| "attention_dropout": 0.0,
| "auto_map": {
| "AutoConfig": "configuration_deepseek.DeepseekConfig",
| "AutoModel": "modeling_deepseek.DeepseekModel",
| "AutoModelForCausalLM": "modeling_deepseek.DeepseekForCausalLM"
| },
| "bos_token_id": 100000,
| "eos_token_id": 100001,
If it's a Llama then it will say "architectures": ["LlamaForCausalLM"] right?
Oh, I wasn’t aware of that model. I thought they were referencing models such as https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main which is llama based
Thanks @ehartford for spotting it & reopening the issue. I hope Axolotl team will come up with the solution.
@ajinkya123-robo , in the meantime, you can just use AutoCausalModelLM and AutoTokenizer with an existing config and point to your model (?). Unfortunately, in this case, sample packing isn't available yet
@NanoCode012 I am not sure if it will work out but I will give it try over the weekend.
Currently experimenting with training DeepSeek Coder and stumbled on this thread when I ran into:
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 209, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
simple_launcher(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
Using the examples/llama-2/qlora.yml as a reference and changing the following:
- base_model: NousResearch/Llama-2-7b-hf
- model_type: LlamaForCausalLM
- tokenizer_type: LlamaTokenizer
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
+ model_type: AutoCausalModelLM
+ tokenizer_type: AutoTokenizer
seems to work, thanks!
i did some experiments with deepseek-coder-v2, which works using:
base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Base
model_type: AutoCausalModelLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
special_tokens:
pad_token: "<|EOT|>"
bos_token: "<|begin▁of▁sentence|>"
eos_token: "<|end▁of▁sentence|>"
multipack support was added recently in https://github.com/axolotl-ai-cloud/axolotl/pull/1712, although it wasn't working for me:
File "/root/micromamba/envs/dev/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered