CoLLiE
CoLLiE copied to clipboard
Collaborative Training of Large Language Models in an Efficient Way
使用最新 dev分支代码训练 llama2 70B ,存在以下问题: │collie/collie/models/llama/model.py:203 in _forward │ │ │ │ 200 │ │ │ │ │ │ │ .permute(0, 2, 1, 4, 3) \ │ │ 201 │...
Dear authors, I found that collie can not initialize DeepSpeed when using models in the transformers library. For example, when replace [this line](https://github.com/OpenLMLab/collie/blob/main/examples/finetune_llama_for_summary.py#L81) of script with the `from_pretrained` interface of...
用的dev分支,examples/further_pretrain_llama里的脚本,运行指令是 ``` torchrun --rdzv_backend=c10d --rdzv_endpoint=localhost:29402 --nnodes=1 --nproc_per_node=8 expand_vocab.py ``` 只修改了llama的路径,包括config、tokenizer和model.from_pretrained。报错如下: ``` ╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮ │ /d2/data/chuxiong/collie/examples/further_pretrain_llama/expand_vocab.py:85 in │ │ │ │ 82 │ model.get_input_embedding()[1].weight.requires_grad = True...
Hi authors, Whether [lr = trainer.lr_scheduler.step(global_step)](https://github.com/OpenLMLab/collie/blob/main/collie/controller/trainer.py#L413) for Lomo in the Trainer is implemented? If so, how to enable it? Thanks!
zero3能够和模型并行一起用吗?我在尝试中使用 ``` config.use_flash = False config.tp_size = 4 config.ds_config = { "fp16": { "enabled": True }, "zero_allow_untested_optimizer": True, "zero_force_ds_cpu_optimizer": False, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": False...
MOSS2 is in alpha test.