Ben Rood comments

Results 90 comments of


                                            Ben Rood

请教一下Chinese-Alpaca-Plus-13B继续做peft微调需要多少显存？

> 是。使用torch.distributed.launch调用的多卡，本质是模型会加载到每个gpu卡里面，训练数据切分为4份，所以是小于等于4倍速，数据通信还要花点时间。那么13B用32G显存能微调吗？

http代理协议入站和pip不兼容？[Bug]

> 可以把 `http` 改成 `socks5http` 这样倒是pip3用sock5访问能过，不过http入站可能还是算有点兼容性问题。

请问chatglm with lora 什么时候支持多卡fine tune啊

> 把模型和数据放置到不同的device上就可以并行了，你可以参考这个实现：https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/Chatglm6b_ModelParallel 我的卡是半精度比单精度快很多的型号，用fp16=true似乎训练速度没有提升，是需要增加其它参数吗？ training_chatglm_csc_demo.py: 102 model.train_model(args.train_file, args={'fp16': True})

请问chatglm with lora 什么时候支持多卡fine tune啊

> 我还在解决这个问题，fp16训练当前只减少显存占用了，没有起到加速作用。赞。我也试了折腾int8，改了一点之后，还是卡在 AttributeError: 'CastOutputToFloat' object has no attribute 'weight' ``` mambaforge/lib/python3.10/site-packages/peft/utils/other.py:75 in prepare_model_for_int8_training │ 72 │ if hasattr(model, output_embedding_layer_name): │ │ 73 │ │ output_embedding_layer = getattr(model, output_embedding_layer_name) │...

请问chatglm with lora 什么时候支持多卡fine tune啊

> 我还在解决这个问题，fp16训练当前只减少显存占用了，没有起到加速作用。看到transformer文档里面似乎也是表示fp16可能会在大batch size时省显存，要加速对模型有苛刻要求。 https://huggingface.co/docs/transformers/v4.13.0/en/performance ”So there is only a real memory saving if we train at a high batch size (and it’s not half) and at batch sizes lower...

Community maintained: CentOS 7 Support

I've use lxd/lxc to run a ubuntu 22.04 on centos 7, then install TLJH on it, works great. As the IT operator only give me a VM with centos 7....

Recent White List issues

@Mygod a few tcpdump file on both side for a ssr tls1.2 obfs. and in that time, telnet server ssh port (high port other than 22), I can got ssh...

ChatGLM模型微调问题咨询

> chatglm-6b-belle-zh-lora 权重我没更新，可以自己训练，或者用shibing624/chatglm-6b-csc-zh-lora 最新的代码跑train也有报错 ``` /DaTa/.local/home/hai.li/mambaforge/lib/python3.10/site-packages/torch/_dynamo/variables/builder │ │ .py:812 in wrap_fx_proxy_cls │ │ │ │ 809 │ │ │ │ "ignore_subclass": ignore_subclass, │ │ 810 │ │ │ │ "is_tensor":...

ChatGLM模型微调问题咨询

> 更新chatglm-6b的文件。你的torch版本是2.0还是1.13.1？我用的torch 2.0 之前的错误应该和 https://github.com/pytorch/pytorch/issues/97077 这个有关，但是勉强绕过去之后又碰到了新的错误

ChatGLM模型微调问题咨询

> 更新代码总算可以了，不过继续训练时仍然有报错，看上去是peft加载训练后的模型时报错，是现在的模式无法继续训练吗？ ``` textgen/examples/chatglm$ python training_chatglm_demo.py --do_train 2023-04-21 11:15:08.224 | INFO | textgen.chatglm.chatglm_model:train_model:235 - Restarting from ./outputs/adapter_model.bin ... │ /DaTa/dl/textgen_lora_train/textgen/examples/chatglm/../../textgen/chatglm/chatglm_model.py:241 in train_model if os.path.exists(checkpoint_name): logger.info(f"Restarting from {checkpoint_name}") adapters_weights =...