CoLLiE icon indicating copy to clipboard operation
CoLLiE copied to clipboard

Collaborative Training of Large Language Models in an Efficient Way

Results 26 CoLLiE issues
Sort by recently updated
recently updated
newest added

使用最新 dev分支代码训练 llama2 70B ,存在以下问题: │collie/collie/models/llama/model.py:203 in _forward │ │ │ │ 200 │ │ │ │ │ │ │ .permute(0, 2, 1, 4, 3) \ │ │ 201 │...

Dear authors, I found that collie can not initialize DeepSpeed when using models in the transformers library. For example, when replace [this line](https://github.com/OpenLMLab/collie/blob/main/examples/finetune_llama_for_summary.py#L81) of script with the `from_pretrained` interface of...

help wanted

用的dev分支,examples/further_pretrain_llama里的脚本,运行指令是 ``` torchrun --rdzv_backend=c10d --rdzv_endpoint=localhost:29402 --nnodes=1 --nproc_per_node=8 expand_vocab.py ``` 只修改了llama的路径,包括config、tokenizer和model.from_pretrained。报错如下: ``` ╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮ │ /d2/data/chuxiong/collie/examples/further_pretrain_llama/expand_vocab.py:85 in │ │ │ │ 82 │ model.get_input_embedding()[1].weight.requires_grad = True...

bug

Hi authors, Whether [lr = trainer.lr_scheduler.step(global_step)](https://github.com/OpenLMLab/collie/blob/main/collie/controller/trainer.py#L413) for Lomo in the Trainer is implemented? If so, how to enable it? Thanks!

bug

zero3能够和模型并行一起用吗?我在尝试中使用 ``` config.use_flash = False config.tp_size = 4 config.ds_config = { "fp16": { "enabled": True }, "zero_allow_untested_optimizer": True, "zero_force_ds_cpu_optimizer": False, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": False...

help wanted