axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Kindly add DeepSeek family for training

Open ajinkya123-robo opened this issue 1 year ago • 8 comments

⚠️ Please check that this feature request hasn't been suggested before.

  • [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
  • [X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Axolotl for DeepSeek model finetuning. Thanks

✔️ Solution

Sft, Lora and qLora will suffice. It will be great to have their models in Axolotl training platform.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this feature has not been requested yet.
  • [X] I have provided enough information for the maintainers to understand and evaluate this request.

ajinkya123-robo avatar Feb 27 '24 05:02 ajinkya123-robo

Hey, as said here https://github.com/OpenAccess-AI-Collective/axolotl/discussions/1171, it's llama-based, so you can use the llama configs :)

NanoCode012 avatar Feb 27 '24 07:02 NanoCode012

Very cool, Thank you!

ajinkya123-robo avatar Feb 27 '24 11:02 ajinkya123-robo

It doesn't look so:

https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json

{
--
  | "architectures": [
  | "DeepseekForCausalLM"
  | ],
  | "attention_bias": false,
  | "attention_dropout": 0.0,
  | "auto_map": {
  | "AutoConfig": "configuration_deepseek.DeepseekConfig",
  | "AutoModel": "modeling_deepseek.DeepseekModel",
  | "AutoModelForCausalLM": "modeling_deepseek.DeepseekForCausalLM"
  | },
  | "bos_token_id": 100000,
  | "eos_token_id": 100001,

If it's a Llama then it will say "architectures": ["LlamaForCausalLM"] right?

ehartford avatar Feb 27 '24 20:02 ehartford

Oh, I wasn’t aware of that model. I thought they were referencing models such as https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main which is llama based

NanoCode012 avatar Feb 28 '24 09:02 NanoCode012

Thanks @ehartford for spotting it & reopening the issue. I hope Axolotl team will come up with the solution.

ajinkya123-robo avatar Feb 28 '24 17:02 ajinkya123-robo

@ajinkya123-robo , in the meantime, you can just use AutoCausalModelLM and AutoTokenizer with an existing config and point to your model (?). Unfortunately, in this case, sample packing isn't available yet

NanoCode012 avatar Feb 28 '24 19:02 NanoCode012

@NanoCode012 I am not sure if it will work out but I will give it try over the weekend.

ajinkya123-robo avatar Feb 29 '24 04:02 ajinkya123-robo

Currently experimenting with training DeepSeek Coder and stumbled on this thread when I ran into:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 209, in get_spm_processor
    tokenizer.Load(self.vocab_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Using the examples/llama-2/qlora.yml as a reference and changing the following:

- base_model: NousResearch/Llama-2-7b-hf
- model_type: LlamaForCausalLM
- tokenizer_type: LlamaTokenizer
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
+ model_type: AutoCausalModelLM
+ tokenizer_type: AutoTokenizer

seems to work, thanks!

ZaneH avatar Mar 31 '24 19:03 ZaneH

i did some experiments with deepseek-coder-v2, which works using:

base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Base
model_type: AutoCausalModelLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

special_tokens:
  pad_token: "<|EOT|>"
  bos_token: "<|begin▁of▁sentence|>"
  eos_token: "<|end▁of▁sentence|>"

multipack support was added recently in https://github.com/axolotl-ai-cloud/axolotl/pull/1712, although it wasn't working for me:

File "/root/micromamba/envs/dev/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered

tmm1 avatar Aug 21 '24 05:08 tmm1