i4never comments

Results 24 comments of


                                            i4never

啥时候适配vllm框架呢？

TigerBot模型基于llama-2架构，[vllm](https://github.com/vllm-project/vllm)适配了 meta-llama/Llama-2-70b-hf 架构，可以参考vllm的quickstart。另外，想适配vllm一般是有serve模型的需求，可以考虑使用[TGI](https://github.com/huggingface/text-generation-inference)， TGI中的fast_llama_modeling中集成了flash_attn与vllm，第一个token的生成使用了flash_attn，后续token使用了vllm。 https://github.com/huggingface/text-generation-inference/blob/3238c49121b02432bf2938c6ebfd44f06c5adc2f/server/text_generation_server/models/custom_modeling/flash_llama_modeling.py#L291-L313

官方可以提供一份run train_with_qlora.py的示例脚本吗

```shell python train_with_qloara.py \ --model_name_or_path TigerResearch/tigerbot-7b-chat \ --data_files ./*.jsonl \ --do_train \ --output_dir ./tigerbot-7b-chat-qlora \ --do_train \ --num_train_epochs 3 \ --learning_rate 2e-5 \ --save_strategy "steps" \ --save_steps 100 \ --logging_steps...

官方可以提供一份run train_with_qlora.py的示例脚本吗

@zhangfan-algo 可以试试zero3+offload的方式、我们没有在这个配置下跑过、但是大概率全量微调没有问题。

Why use two streams for context parallel

Two streams will help overlap communication and computation. The second stream can start processing the next chunk of data as soon as it is received, while the first stream is...