InternEvo icon indicating copy to clipboard operation
InternEvo copied to clipboard

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Results 79 InternEvo issues
Sort by recently updated
recently updated
newest added

### 描述问题 InternEvo能否load预训练llama2的参数,再继续预训练,用hf的格式还是原始的格式

question

### Describe the bug 使用internevo训练,转换成hf模型用opencompass测试ppl的时候有一定概率会nan,opencompass默认是用fp16测试的,是因为这个原因导致的嘛?切换成bf16后这个问题能够解决,但是其他的hf模型并没有这个问题,请问和use_fp32_norm有关嘛,训练用的bf16 ### Environment 官方镜像 ### Other information _No response_

bug

### Describe the bug 我好像没有找到用internevo训练然后转换成对应的hf的脚本?请问有提供嘛? ### Environment 官方代码 ### Other information _No response_

bug

### Describe the bug 现在Internevo代码中的tflops直接按照公式计算,但是当使用tp或者pp的时候模型被切开了,导致tflops不准确 ### Environment 官方镜像代码 ### Other information _No response_

bug

### Describe the feature a very simple on-the-fly dataloader is needed to support most pubic dataset ### Will you implement it? - [X] I would like to implement this feature...

enhancement

### 描述该错误 训练MoE模型时,模型的tflops只有几十,正常训练的时候是正常的 ### 环境信息 官方镜像代码 ### 其他信息 _No response_

bug

### Describe the question. 我用internevo跑了一个7B的模型,拿到了一个internevo的模型权重,现在我要基于这个权重跑一个MoE的模型,我发现load进来会报这个错,请问如何解决? AssertionError: /beegfs/workspace/nlp/leo/model_ckpt/7B_v7/715255/model_moe_layer0_expert0_tp0.pt is not found!

question

### Describe the question. 请问大佬们 Internevo这个框架里面MoE支持expert parallel嘛?如果有的话怎么使用呢?不然直接训练MoE感觉tflops很低

question

### Describe the bug Traceback (most recent call last): File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/queues.py", line 368, in put self._writer.send_bytes(obj) File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 200, in...

bug

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...