i4never

Results 24 comments of i4never

TigerBot模型基于llama-2架构,[vllm](https://github.com/vllm-project/vllm)适配了 meta-llama/Llama-2-70b-hf 架构,可以参考vllm的quickstart。 另外,想适配vllm一般是有serve模型的需求,可以考虑使用[TGI](https://github.com/huggingface/text-generation-inference), TGI中的fast_llama_modeling中集成了flash_attn与vllm,第一个token的生成使用了flash_attn,后续token使用了vllm。 https://github.com/huggingface/text-generation-inference/blob/3238c49121b02432bf2938c6ebfd44f06c5adc2f/server/text_generation_server/models/custom_modeling/flash_llama_modeling.py#L291-L313

```shell python train_with_qloara.py \ --model_name_or_path TigerResearch/tigerbot-7b-chat \ --data_files ./*.jsonl \ --do_train \ --output_dir ./tigerbot-7b-chat-qlora \ --do_train \ --num_train_epochs 3 \ --learning_rate 2e-5 \ --save_strategy "steps" \ --save_steps 100 \ --logging_steps...

@zhangfan-algo 可以试试zero3+offload的方式、我们没有在这个配置下跑过、但是大概率全量微调没有问题。

Two streams will help overlap communication and computation. The second stream can start processing the next chunk of data as soon as it is received, while the first stream is...