Daniel-1997 issues

Results 3 issues of


                                            Daniel-1997

Do evaluation during training

When I read the parameters to be set to finetune.py, I am a little confused. since there are several parameters about evaluation during training: -- validation_file: I did not find...

ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer

显卡配置：2张 V100 32G （共四张，有两张别人占用中，用完后可实现利用4卡V100）按照默认accelerate配置报错：cuda out of memory，观察发现默认配置中 offload_optimizer_device 和 offload_param_device 参数均为none，后按照accelerate教程，将这两个参数均改成 cpu 报错： ![image](https://github.com/OpenLMLab/MOSS/assets/59271872/7d446a9f-b69c-40ad-8cb7-946a58376a00) accelerate 配置如下： command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu...

使用默认代码在单卡上进行推理的时候为什么其他显卡上也会有进程（占用）

![image](https://github.com/mymusise/ChatGLM-Tuning/assets/59271872/e0191855-2e55-4cc1-804d-72d6f2eb0628) 如上，直接用本项目中提供的推理代码，模型和数据都加载到0号显卡上，但是发现2， 3， 4上也会有占用，0号显卡上占用最多（13G+），其他显卡大概占用 4G+，请问这是什么原因呢？ ![image](https://github.com/mymusise/ChatGLM-Tuning/assets/59271872/53888ff9-ec4c-4c73-820b-0fa3d7394eef)