axolotl Kindly add DeepSeek family for training

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Axolotl for DeepSeek model finetuning. Thanks

✔️ Solution

Sft, Lora and qLora will suffice. It will be great to have their models in Axolotl training platform.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

Feb 27 '24 05:02 ajinkya123-robo

Hey, as said here https://github.com/OpenAccess-AI-Collective/axolotl/discussions/1171, it's llama-based, so you can use the llama configs :)

Feb 27 '24 07:02 NanoCode012

Very cool, Thank you!

Feb 27 '24 11:02 ajinkya123-robo

It doesn't look so:

https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json

{
--
  | "architectures": [
  | "DeepseekForCausalLM"
  | ],
  | "attention_bias": false,
  | "attention_dropout": 0.0,
  | "auto_map": {
  | "AutoConfig": "configuration_deepseek.DeepseekConfig",
  | "AutoModel": "modeling_deepseek.DeepseekModel",
  | "AutoModelForCausalLM": "modeling_deepseek.DeepseekForCausalLM"
  | },
  | "bos_token_id": 100000,
  | "eos_token_id": 100001,

If it's a Llama then it will say "architectures": ["LlamaForCausalLM"] right?

Feb 27 '24 20:02 ehartford

Oh, I wasn’t aware of that model. I thought they were referencing models such as https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main which is llama based

Feb 28 '24 09:02 NanoCode012

Thanks @ehartford for spotting it & reopening the issue. I hope Axolotl team will come up with the solution.

Feb 28 '24 17:02 ajinkya123-robo

@ajinkya123-robo , in the meantime, you can just use AutoCausalModelLM and AutoTokenizer with an existing config and point to your model (?). Unfortunately, in this case, sample packing isn't available yet

Feb 28 '24 19:02 NanoCode012

@NanoCode012 I am not sure if it will work out but I will give it try over the weekend.

Feb 29 '24 04:02 ajinkya123-robo

Currently experimenting with training DeepSeek Coder and stumbled on this thread when I ran into:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 209, in get_spm_processor
    tokenizer.Load(self.vocab_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Using the examples/llama-2/qlora.yml as a reference and changing the following:

- base_model: NousResearch/Llama-2-7b-hf
- model_type: LlamaForCausalLM
- tokenizer_type: LlamaTokenizer
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
+ model_type: AutoCausalModelLM
+ tokenizer_type: AutoTokenizer

seems to work, thanks!

Mar 31 '24 19:03 ZaneH

i did some experiments with deepseek-coder-v2, which works using:

base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Base
model_type: AutoCausalModelLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

special_tokens:
  pad_token: "<|EOT|>"
  bos_token: "<｜begin▁of▁sentence｜>"
  eos_token: "<｜end▁of▁sentence｜>"

multipack support was added recently in https://github.com/axolotl-ai-cloud/axolotl/pull/1712, although it wasn't working for me:

File "/root/micromamba/envs/dev/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered

Aug 21 '24 05:08 tmm1

axolotl axolotl copied to clipboard

Kindly add DeepSeek family for training

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

axolotl
axolotl copied to clipboard