刘维克

Results 10 comments of 刘维克

# 我遇到了相同的问题,我通过修改训练启动命令解决了. - 我遇到了同一样报错` 07/29 22:44:31 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry...

> # 我遇到了相同的问题,我通过修改训练启动命令解决了. > * 我遇到了同一样报错` 07/29 22:44:31 - mmengine - �[5m�[4m�[33mWARNING�[0m - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current...

# DEBUG成功:我把nijia的环境加入了PATH,现在可以使用`deepspeed`了,完全解决了问题. - 我参考了几乎相同的issue[使用 deepspeed_zero2 训练启动失败 #80](https://github.com/InternLM/xtuner/issues/80).佬建议: > 目前来看并不是因为显存不足,同时我在两张T4上能够正常启动DeepSpeed训练 > 怀疑是DeepSpeed安装的问题,建议您可以尝试使用命令ds_report检查一下是否有错误? > 如果上述命令一切正常,可以尝试运行一些DeepSpeed官方提供的examples脚本,如[DeepSpeed_CIFAR](https://github.com/microsoft/DeepSpeedExamples/tree/master/training/cifar),验证DeepSpeed能否正常启动~ - 我测试了这个官方检测脚本,发现是`ninjia`没有识别.当我在jupyter中增加 ```python import os os.environ['PATH'] += ':/home/aistudio/.local/bin' # for ninja os.environ['PATH'] += ':/home/aistudio/.local/lib/python3.10/site-packages/ninja/data/bin' ``` 之后,使用`deepspeed`训练便完全正常了! 附上命令`!/home/aistudio/.local/bin/xtuner...

> [@jhaggle](https://github.com/jhaggle) I think the problem is not the list comprehension by itself but the fact that `label` can also include elements with the `ignore_index` (default 255). This makes the...

`'RandomResize'` will got `RuntimeError: stack expects each tensor to be equal size, but got [1, 512, 512] at entry 0 and [1, 512, 527] at entry 5`. So I changed...

This is strange. Based on your information, it can be seen that you have successfully started 12 `num_workers` . However, only two CPU threads are occupied. Does your server have...

Just copy `layoutlmv3-base-finetuned-publaynet/config.json` to `/content/output_dir/`. The code need it.

`!pip install autogluon scikit-learn==1.5.2` works on __kaggle T4x2__. Ref: [['super' object has no attribute '__sklearn_tags__'](https://stackoverflow.com/questions/79290968/super-object-has-no-attribute-sklearn-tags)](https://stackoverflow.com/questions/79290968/super-object-has-no-attribute-sklearn-tags)

I add `assigned_gt_index = paddle.cast(assigned_gt_index, dtype="int32")` before this line of code `assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes`. It works for `paddle3.0.0b1` both on `Ubuntu 22.04 cu188 RTX4090` and `WSL2...

从`>=0.1.21`[[Bugs] fix dispatch bugs](https://github.com/InternLM/xtuner/commit/c2328a02531ed17a96aef1c82584118fe2bac6bf)开始,默认检测`rope_theta`,当然新的模型如`interlm2`系列的`config.json`都是有这个参数的.然而旧的模型如`internlm-chat-7b`都是没有这个参数的,我查到旧模型的最新tag的`internlm-chat-7b`模型(在modelscope和hf上的repo)都是没有该参数的.如果你使用了新版的xtuner配合`internlm`旧模型,你可以手动在`config.json`增加上` "rope_theta": 1000000`这一行.我测试过这样是可以训练的.然而我不懂相关参数的意义和这么做可能引起的其他后果.