CoLLiE
CoLLiE copied to clipboard
Collaborative Training of Large Language Models in an Efficient Way
你好,怎么让保存的模型能够分片,而不是保存一个几十G的大模型。 我看了一遍这里面的参数,但是发现是完整保存的,而不是分片的,这个有办法吗
![image](https://github.com/OpenLMLab/collie/assets/37737346/6af3780f-45c1-4a3f-8760-e871b0183400) 这是如果 clip_grad_norm 不是 None 会有问题,所以对于 adalomo 是不需要 clip_grad_norm 吗?
Hi @KaiLv69 thanks for the writeup and implementation for AdaLomo! It looks like it is [missing](https://github.com/OpenLMLab/collie/blob/dev/collie/optim/adalomo.py) the `step` method which torch needs to use this in other frameworks. Can you...
猜测是并行 size 或 batch size 设置不当(无法整除数据量),可能会有数据被重复计算。
使用数据类_ShardContainer,会报下面错误,请问是什么原因?(dev分支) `ValueError: mmap closed or invalid` ![image](https://github.com/OpenLMLab/collie/assets/111968302/fb31dbed-ff8c-40ca-86d0-5a5960e1b0fc)
Hi, there! I saved parallel state_dict (requires_grad True only) with 8 GPUs remotely, how to load these state_dicts and save them as one locally? Thanks in advance. ``` collie_dp0_pp0_tp0.pt collie_zero_dp0_pp0_tp0.pt...
As a subclass of `torch.optim.Optimizer`, could `collie.optim.Lomo` support `param_groups`, by calling `super().__init__(params, defaults)`. So that we can fit for more schedulers and use `per-parameter` method to filter out some modules...
Due to the Grouped-Query Attention introduced in LLaMA-2 70B,[llama issue](https://github.com/facebookresearch/llama/issues/384),it cannot be loaded into the collie implementation of LLaMA. Hope LLaMA-2 70B can be support in collie. Thanks ``` Traceback...
8张 V100 显卡,开启 Zero3,TP=1,PP=1,DP=8,LlamaForCausalLM.from_pretrained llama 70B 模型会出现 OOM (内存不够,不是显存不够),物理内存 512GB。 原因是 dev 分支中,base.py 304行, state_dict = {} if not is_zero3_enabled(config) or env.dp_rank == 0 \ or config.low_cpu_mem_usage or config.quantization_config.load_in_8bit \...