one-yolov5
one-yolov5 copied to clipboard
one-yolov5/classify/train.py 脚本 nsys 报告 【2023-03-29】
- 引言
- [one-yolo 测试结果]
- [one-yolov5项目相关数据]
- [one-yolo 详细测试数据]
- [修复方案]
- [ 资料集]
引言
对 one-yolov5/classify/train.py 跑了两份 nsys 报告 .
one-yolo_profile: 03-29-07-10profile.zip
torch-yolo_profile: torch_03-29-08-37profile.zip
one-yolo 测试结果
https://github.com/Oneflow-Inc/one-yolov5/blob/f1aaf236d05d46b5aea50bf4318edacbcd687b38/classify/train.py#L245
one-yolo | torch-yolo | |
---|---|---|
tloss这一行耗时 | 99ms | 14ms |
注意:
- flow.version='0.9.1.dev20230327+cu117'
- torch.version='1.13.0+cu117'
- 均使用 float32训练·。
- 启动指令均使用batch-size=256 , epochs = 6 , model = yolov5s-cls 模型
- 机器 a100
结论:nsys分析看 tloss 这一行速度比较明显低于torch-yolo。如果优化速度将得到极大提升。
one-yolov5项目相关数据
项目地址: https://github.com/Oneflow-Inc/one-yolov5 数据集路径: @oneflow-25:/data/home/fengwen/imagenette160 权重路径: @oneflow-25:/data/home/fengwen/weight_v1_2_0
如果执行nsys产生报错
The target application terminated. One or more process it created re-parented.
Waiting for termination of re-parented processes.
Use the `--wait` option to modify this behavior.
请将 train.py中 check_git_status() 这一行注释
one-yolo 详细测试数据
one-yolov5启动指令
DATESTR=$(date +"%m-%d-%H-%M")
cd ~/one-yolov5
set -e
# py-spy record -o profile.svg --native --
run_cmd="/usr/local/cuda/bin/nsys profile -o runs/${DATESTR}profile python \
classify/train.py \
--model runs/yolov5s-cls.pt \
--data ../datasets/imagenette160 \
--img 224 \
--batch 256 \
--epochs 6 \
--project One-YOLOv5_v_1_2_0_train \
--name yolov5n-default \
--multi_tensor_optimizer \
--name yolov5n-default --lr0 0.1 --optimizer SGD "
echo ${run_cmd}
eval ${run_cmd}
one-yolo_profile 03-29-07-10profile.zip
torch-yolo_profile
torch_03-29-08-37profile.zip
修复方案
努力加载中。。。
资料集
- https://github.com/Oneflow-Inc/oneflow/pull/9394
- /data/home/fengwen/package/oneflow/.idea/make_flow.sh