ChatGLM2-6B 全参数精调 [launch.py:315:sigkill

全参数精调 [launch.py:315:sigkill_handler] Killing subprocess

Open jakeywu opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

7张24G的显卡

Expected Behavior

No response

Steps To Reproduce

启动脚本：

Environment

- OS:Ubuntu 18.04.6 LTS
- Python:3.10.9
- Transformers:4.30.2
- PyTorch:2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

Aug 09 '23 04:08 jakeywu

你是CUDA out of memory，我把--fp16 改成： --pre_seq_len 128
--quantization_bit 4 就可以了，但是感觉没有分布式加速，指定了4张gpu，每张卡的现存占用都是一样的，感觉每个gpu都在重复计算，并没有加速

Aug 09 '23 06:08 feipengheart

Is there an existing issue for this?

[x] I have searched the existing issues

Current Behavior
7张24G的显卡 ![image](https://user-images.githubusercontent.com/11456239/259295223-40f1a478-f807-42ce-8b42-f5ac25632223.png)
Expected Behavior

No response

Steps To Reproduce
启动脚本： ![image](https://user-images.githubusercontent.com/11456239/259295549-087de0c7-7336-4500-b219-0320cc467f0e.png)
Environment
- OS:Ubuntu 18.04.6 LTS
- Python:3.10.9
- Transformers:4.30.2
- PyTorch:2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True
Anything else?

No response

你好，请问这个问题有解决吗？

Aug 09 '23 09:08 xlhuang132

没有搞定 @xlhuang132

Oct 10 '23 10:10 jakeywu

全参数咋调的，求助

Jan 29 '24 09:01 guanslai

ChatGLM2-6B ChatGLM2-6B copied to clipboard

全参数精调 [launch.py:315:sigkill_handler] Killing subprocess

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM2-6B
ChatGLM2-6B copied to clipboard