CoLLiE icon indicating copy to clipboard operation
CoLLiE copied to clipboard

Collaborative Training of Large Language Models in an Efficient Way

Results 26 CoLLiE issues
Sort by recently updated
recently updated
newest added

你好,怎么让保存的模型能够分片,而不是保存一个几十G的大模型。 我看了一遍这里面的参数,但是发现是完整保存的,而不是分片的,这个有办法吗

enhancement

![image](https://github.com/OpenLMLab/collie/assets/37737346/6af3780f-45c1-4a3f-8760-e871b0183400) 这是如果 clip_grad_norm 不是 None 会有问题,所以对于 adalomo 是不需要 clip_grad_norm 吗?

Hi @KaiLv69 thanks for the writeup and implementation for AdaLomo! It looks like it is [missing](https://github.com/OpenLMLab/collie/blob/dev/collie/optim/adalomo.py) the `step` method which torch needs to use this in other frameworks. Can you...

猜测是并行 size 或 batch size 设置不当(无法整除数据量),可能会有数据被重复计算。

使用数据类_ShardContainer,会报下面错误,请问是什么原因?(dev分支) `ValueError: mmap closed or invalid` ![image](https://github.com/OpenLMLab/collie/assets/111968302/fb31dbed-ff8c-40ca-86d0-5a5960e1b0fc)

Hi, there! I saved parallel state_dict (requires_grad True only) with 8 GPUs remotely, how to load these state_dicts and save them as one locally? Thanks in advance. ``` collie_dp0_pp0_tp0.pt collie_zero_dp0_pp0_tp0.pt...

As a subclass of `torch.optim.Optimizer`, could `collie.optim.Lomo` support `param_groups`, by calling `super().__init__(params, defaults)`. So that we can fit for more schedulers and use `per-parameter` method to filter out some modules...

Due to the Grouped-Query Attention introduced in LLaMA-2 70B,[llama issue](https://github.com/facebookresearch/llama/issues/384),it cannot be loaded into the collie implementation of LLaMA. Hope LLaMA-2 70B can be support in collie. Thanks ``` Traceback...

bug

8张 V100 显卡,开启 Zero3,TP=1,PP=1,DP=8,LlamaForCausalLM.from_pretrained llama 70B 模型会出现 OOM (内存不够,不是显存不够),物理内存 512GB。 原因是 dev 分支中,base.py 304行, state_dict = {} if not is_zero3_enabled(config) or env.dp_rank == 0 \ or config.low_cpu_mem_usage or config.quantization_config.load_in_8bit \...

bug