CoLLiE issues

Results 26 CoLLiE issues

Sort by recently updated

Llama2 70B 训练报错

使用最新 dev分支代码训练 llama2 70B ，存在以下问题： │collie/collie/models/llama/model.py:203 in _forward │ │ │ │ 200 │ │ │ │ │ │ │ .permute(0, 2, 1, 4, 3) \ │ │ 201 │...

xiaopqr

deep_speed initialization for models in the transformers library

Dear authors, I found that collie can not initialize DeepSpeed when using models in the transformers library. For example, when replace [this line](https://github.com/OpenLMLab/collie/blob/main/examples/finetune_llama_for_summary.py#L81) of script with the `from_pretrained` interface of...

DesperateExplorer

help wanted

llama-2-7b拓展词表报错

用的dev分支，examples/further_pretrain_llama里的脚本，运行指令是 ``` torchrun --rdzv_backend=c10d --rdzv_endpoint=localhost:29402 --nnodes=1 --nproc_per_node=8 expand_vocab.py ``` 只修改了llama的路径，包括config、tokenizer和model.from_pretrained。报错如下： ``` ╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮ │ /d2/data/chuxiong/collie/examples/further_pretrain_llama/expand_vocab.py:85 in │ │ │ │ 82 │ model.get_input_embedding()[1].weight.requires_grad = True...

skepsun

bug

Whether lr_scheduler for Lomo is implemented now?

Hi authors, Whether [lr = trainer.lr_scheduler.step(global_step)](https://github.com/OpenLMLab/collie/blob/main/collie/controller/trainer.py#L413) for Lomo in the Trainer is implemented? If so, how to enable it? Thanks!

DesperateExplorer

bug

tensor parallel + zero3 error

zero3能够和模型并行一起用吗？我在尝试中使用 ``` config.use_flash = False config.tp_size = 4 config.ds_config = { "fp16": { "enabled": True }, "zero_allow_untested_optimizer": True, "zero_force_ds_cpu_optimizer": False, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": False...

LZY-the-boys

help wanted

feat: Add MOSS2 tp&pp model and sparse attention kernel(in triton)

MOSS2 is in alpha test.

Li-dongyang

CoLLiE
CoLLiE copied to clipboard

Metadata

Llama2 70B 训练报错

deep_speed initialization for models in the transformers library

llama-2-7b拓展词表报错

Whether lr_scheduler for Lomo is implemented now?

tensor parallel + zero3 error

feat: Add MOSS2 tp&pp model and sparse attention kernel(in triton)

← Metadata

Owner

Metadata

CoLLiE CoLLiE copied to clipboard

Metadata

Llama2 70B 训练报错

deep_speed initialization for models in the transformers library

llama-2-7b拓展词表报错

Whether lr_scheduler for Lomo is implemented now?

tensor parallel + zero3 error

feat: Add MOSS2 tp&pp model and sparse attention kernel(in triton)

← Metadata

Owner

Metadata

CoLLiE
CoLLiE copied to clipboard