LOMO
LOMO copied to clipboard
LOMO: LOw-Memory Optimization
Hi, When I use the eval code, it shows that: `` Traceback (most recent call last): File "/data/sx/gpt-quant/gpt-eval-opencompass/opencompass/opencompass/tasks/openicl_infer.py", line 7, in from mmengine.config import Config, ConfigDict ModuleNotFoundError: No module named...
qlora 8196 23366MiB / 81251MiB config = LoraConfig( r=8, lora_alpha=32, inference_mode=False, target_modules=["q_pro","v_proj","down_proj","up_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) 测试代码: # llama tokenizer = LlamaTokenizer.from_pretrained(model_id) tokenizer.padding_side = "right" tokenizer.bos_token_id = 1 tokenizer.eos_token_id =...
Can you provide detailed dependency versions?
您好,感谢您出色的工作,我在chaglm2模型和adalomo时出现了问题 Traceback (most recent call last): File "/home/pycharmProjcet/adalomo/instruction-tuning/train_dx_chatglm2.py", line 276, in train() File "/home/pycharmProjcet/adalomo/instruction-tuning/train_dx_chatglm2.py", line 259, in train trainer.train() File "/home/anaconda3/envs/adalomo/lib/python3.10/site-packages/collie_lm-1.0.5-py3.10.egg/collie/controller/trainer.py", line 364, in train loss = self.train_fn(self, batch, self.global_batch_idx)...
您好,再次感谢 lomo 这一出色的系列工作。目前我正在尝试使用 collie 这里面的框架进行训练,使用的是最新的 dev 分支代码。在训练时我使用 llama 2 13b 这种规模的模型是没有问题的,但是在 70b 一直会出现  我是用的脚本就是您这个项目了 adalomo 中直接提供的 instruction tuning 的脚本 其他参数只是设置 tp=2 模型我自己给了个路径。请问您这个有遇到过或者该如何设置呢?我的机器环境是 8*A100 的配置。 感谢~~
论文中比较了zero shot、lora和lomo的实验效果,但是缺少adam或者adamw的实验效果。 请问lomo和adam之间的差距有多大,是否有进行实验,期待你的回复,感谢~
我使用lomo(和zero3)在8张NVIDIA 3090 GPU上微调chatglm2-6b,并使用LOMOTrainer的save_model方法保存。重新加载模型checkpoint后,我发现模型测出来的验证集loss与训练结束时测出来的不一样。我参考deepspeed官方保存模型的代码,重写了save_model(重写的代码如下),发现这个bug解决了。这说明原来版本的save_model有bug,但我还没有找到具体出错原因。 I used LOMO (and zero3) to fine-tune chatglm2-6b on 8 NVIDIA 3090 GPUs and saved it using LOMOTrainer's save_model method. After reloading the model checkpoint, I found...
I ran into this problem, I am playing with multirc data and using lora only. First epoch was successful and also validation epoch, on second epoch this error happened, so...
full parameter update里面,我最近在试一种新的loss function,就是在原有的next token prediction上面加一个regularized term,希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error: 比如在lomo_trainer.py中: ```python lamda, regularization = 1, torch.tensor(0, requires_grad=True, dtype=torch.float32) self.model.train() for name, param in self.model.named_parameters(): if "self_attn.q_proj" in name: with GatheredParameters(param): regularization =...
尊敬的作者您好,我按照库中的配置,将`per_device_train_batch_size`和`per_device_eval_batch_size`都设置为1,发现在单卡16GB的V100上运行`lomo_lora_trainer.py`训练LlaMA-7B会出现OOM的问题。 具体配置如下 ``` # model model_name_or_path: 'openlm-research/open_llama_7b' # data dataset_name: 'wic' refresh: false data_tag: 'base' train_on_inputs: false data_max_length: 1024 # training # trainer peft_type: 'lora' lora_only: false hf_learning_rate: 0.0005 hf_weight_decay:...