GXKIM comments

Results 18 comments of


                                            GXKIM

请问baichuan-7B进行PT+SFT+RLHF的全流程微调的话，需要多少显存呢

> PT 阶段需要 GB 量级数据，SFT 只需要几万条数据。现在是不能直接传参--finetuning_type qlora把我看只有lora

关于模型训练时间和显存资源

> You mean 1.5 comsumes more memory than 1? I did test the inference costs, see the doc here https://qwen.readthedocs.io/en/latest/benchmark/hf_infer.html . Maybe I should add a training costs to you...

关于模型训练时间和显存资源

> Context length might be a matter. Are you using the official script or Llama factory or Axolotl? The Llama-factory framework uses normal VRAM for LoRA, but when using the...

关于模型训练时间和显存资源

> > > Context length might be a matter. Are you using the official script or Llama factory or Axolotl?上下文长度可能是一个问题。你使用的是官方脚本还是骆驼工厂或蝾螈？ > > > > > > The Llama-factory framework uses...

[BUG]: ColossalAI/applications/ChatGPT/examples路径下代码的colossalai模型并行策略报错。

我遇到执行train_sft.py的时候遇到了同样的问题

finetuning.py是不是不支持int8的权重，只能使用f16权重哈

> 没试过直接load预量化好的int8的权重，但理论上有问题的话改下加载方式就可以作者您好，要是使用执行finetune.py设置参数int8的话，对应的模型是不是也需要chatglm-6b-int8 这个才行呢

ValueError: could not find the metadata file ckpt/glm-130b-sat/49300/latest, please check --load

> (glm-130b) ➜ GLM-130B git:(main) ✗ bash scripts/evaluate.sh tasks/bloom/glue_cola.yaml WARNING:torch.distributed.run: > > Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded,...

torch run的问题

> > 问题：torchrun找不到是因为torch cuda版本问题嘛？大佬指教 > > 你是在 v100 机器上运行的吗？ v100 需要安装 torchrun > > `pip install bminf` A100

torch run的问题

> > 问题：torchrun找不到是因为torch cuda版本问题嘛？大佬指教 > > 你是在 v100 机器上运行的吗？ v100 需要安装 torchrun > > `pip install bminf` 执行脚本的时候报错，我看脚本里确实使用了torch run scripts/generate.sh 这个脚本

量化int4遇到的问题

> cpu内存256G，GPU 6张3090 > > WARNING:torch.distributed.run: > > Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the...