oneflow
oneflow copied to clipboard
为算子添加性能测试(profile)
任务
为了更直观的确定算子的工作效率,方便与pytorch进行对比,同时也能在调试新模型过程中精确、快速地定位比较耗时的算子,需要给oneflow/python/oneflow/test/modules目录下的算子增加性能测试。
- 添加方法参考:https://github.com/Oneflow-Inc/OneTeam/blob/master/tutorial/howto_test_user_op.md#%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95
下方comments为oneflow/python/oneflow/test/modules目录下的程序表格,由于算子众多,故依照首字母分类和排序,更新中。。。 后期预计添加profile实现情况的自动统计
| 代码 | 执行人 | 是否需要增加profile /是否存在异常 | pr |
|---|---|---|---|
| test_abs.py | 刘轩 | 是 | https://github.com/Oneflow-Inc/oneflow/pull/8889 |
| test_activation.py | 刘轩 | 是 | 同上 |
| test_adaptive.py | 刘轩 | 同上 | |
| test_adaptive_pool.py | 刘轩 | 同上 | |
| test_addcdiv.py | 刘轩 | 同上 | |
| test_addcmul.py | 刘轩 | 同上 | |
| test_addmm.py | 刘轩 | 同上 | |
| test_affine_grid.py | 刘轩 | 同上 | |
| test_allreduce.py | 刘轩 | 否 | |
| test_amax.py | 刘轩 | 同上 | |
| test_amin.py | 刘轩 | 同上 | |
| test_arange.py | 刘轩 | 同上 | |
| test_argmax.py | 刘轩 | 同上 | |
| test_argmin.py | 刘轩 | assert error | |
| test_argsort.py | 刘轩 | 同上 | |
| test_argwhere.py | 刘轩 | ||
| test_as_stride.py | 刘轩 | ||
| test_as_tensor.py | 刘轩 | 否 | |
| test_autograd_function.py | 刘轩 | 否 | |
| test_autograd_mode.py | 刘轩 | 否 | |
| test_autograd.py | 刘轩 | 否 | |
| test_avgpool.py | 刘轩 | ||
| test_batch_gather.py | 刘轩 | 否 | |
| test_batchnorm_add_relu.py | 刘轩 | 否 | |
| test_batchnorm.py | 刘轩 | 否 | |
| test_bernoulli.py | 刘轩 | 异常报错 | |
| test_bmm.py | 刘轩 | ||
| test_broadcast_like.py | 刘轩 | 否 | |
| test_broadcast_ops.py | 刘轩 | ||
| test_cast.py | 刘轩 | 否 | |
| test_ceil.py | 刘轩 | ||
| test_check_meta_consistency.py | 刘轩 | 否 | |
| test_chunk.py | 刘轩 | ||
| test_clamp.py | 刘轩 | ||
| test_clip_grad.py | 刘轩 | 否 | |
| test_coco_reader.py | 刘轩 | 否 | |
| test_coin_flip.py | 刘轩 | 否 | |
| test_comb2to2d.py | 刘轩 | 否 | |
| test_combined_margin_loss.py | 刘轩 | ||
| test_comm_ops.py | 刘轩 | ||
| test_comm.py | 刘轩 | ||
| test_concat.py | 刘轩 | ||
| test_consistent_broadcast_matmul.py | 刘轩 | ||
| test_consistent_inv.py | 刘轩 | ||
| test_consistent_vector_matrix_product.py | 刘轩 | ||
| test_constant_pad.py | 刘轩 | ||
| test_constant.py | 刘轩 | ||
| test_contiguous.py | 刘轩 | ||
| test_conv1d.py | 刘轩 | ||
| test_conv2d.py | 刘轩 | ||
| test_copy.py | 刘轩 | ||
| test_cosine_similarity.py | 刘轩 | ||
| test_ctc_greedy_decoder.py | 刘轩 | ||
| test_ctc_loss.py | 刘轩 | ||
| test_cublas_fused_mlp.py | 刘轩 | ||
| test_cum_ops.py | 刘轩 |
代码 执行人 是否需要增加profile pr test_abs.py 刘轩 test_activation.py 刘轩 test_adaptive.py 刘轩 test_adaptive_pool.py 刘轩 test_addcdiv.py 刘轩 test_addcmul.py 刘轩 test_addmm.py 刘轩 test_affine_grid.py 刘轩 test_allreduce.py 刘轩 test_amax.py 刘轩 test_amin.py 刘轩 test_arange.py 刘轩 test_argmax.py 刘轩 test_argmin.py 刘轩 test_argsort.py 刘轩 test_argwhere.py 刘轩 test_as_stride.py 刘轩 test_as_tensor.py 刘轩 test_autograd_function.py 刘轩 test_autograd_mode.py 刘轩 test_autograd.py 刘轩 test_avgpool.py 刘轩
为test_activate.py 文件中的 class TestGelu 添加profile测试时出现报错, 程序如下:
@profile(torch.nn.GELU)
def profile_gelu(test_case):
torch.nn.GELU(torch.ones(1, 128, 28, 28))
torch.nn.GELU(torch.ones(16, 128, 28, 28))
报错如下:
======================================================================
ERROR: profile_gelu (test_activation.TestGelu)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 275, in profile_gelu
torch.nn.GELU(torch.ones(1, 128, 28, 28))
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 223, in profiled_op
additional_description,
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 96, in run_torch
op(*args, **kwargs)
TypeError: __init__() takes 1 positional argument but 2 were given
----------------------------------------------------------------------
Ran 1 test in 3.858s
FAILED (errors=1)
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
OP Args Lib KT(GPU) BW(GPU) KT(1 CPU) ET(1 CPU) KT(32 CPU) ET(32 CPU)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
nn.GELU ones(1, 128, 28, 28) OF - - - 14.6 - 14.6
nn.GELU ones(16, 128, 28, 28) OF - - - 119.9 - 17.2
现在只支持 functional api,要改成用 torch.nn.functional.gelu,可以参考已有的 profile_relu
问题反馈
问题描述:对于test_abs.py文件中的flow.abs( )进行测试时,出现报错,如下所示,无法输出torch.abs(torch.ones(16, 128, 28, 28))操作的结果。
(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_abs.py
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:20 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:21 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:21 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:21 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:21 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:21 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
F
======================================================================
FAIL: profile_abs (test_abs.TestAbsModule)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_abs.py", line 43, in profile_abs
torch.abs(torch.ones(16, 128, 28, 28))
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
return _profiler_hook(result)
AssertionError
----------------------------------------------------------------------
Ran 1 test in 6.668s
FAILED (failures=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 17.485s
OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
OP Args Lib KT(GPU) BW(GPU) KT(1 CPU) ET(1 CPU) KT(32 CPU) ET(32 CPU)
─────────────────────────────────────────────────────────────────────────────────────────────────────────
abs ones(1, 128, 28, 28) OF 6.1 - 60.2 61.7 52.4 53.8
abs ones(1, 128, 28, 28) PT 1.4 - 20.8 25.2 597.7 607.6
abs ones(16, 128, 28, 28) PT 16.3 - 628.4 633.9 603.8 696.1
但再运行几次后报错消失,可以完整输出测试结果,如下所示
(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_abs.py
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:07 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:07 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:07 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:08 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:08 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
.
----------------------------------------------------------------------
Ran 1 test in 6.756s
OK
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 18.993s
OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
OP Args Lib KT(GPU) BW(GPU) KT(1 CPU) ET(1 CPU) KT(32 CPU) ET(32 CPU)
─────────────────────────────────────────────────────────────────────────────────────────────────────────
abs ones(1, 128, 28, 28) OF 6.1 - 61.6 65.1 52.7 56.0
abs ones(16, 128, 28, 28) OF 37.9 - 855.6 860.3 848.0 852.7
abs ones(1, 128, 28, 28) PT 1.4 - 28.1 32.1 303.1 311.2
abs ones(16, 128, 28, 28) PT 16.4 - 604.2 609.6 1291.2 1300.2
在测试logsigmoid、selu 、sigmoid 时出现如下AssertionError
======================================================================
FAIL: profile_logsigmoid (test_activation.TestLogSigmoidModule)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 500, in profile_logsigmoid
torch.nn.functional.logsigmoid(torch.ones(1, 128, 28, 28))
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
return _profiler_hook(result)
AssertionError
======================================================================
FAIL: profile_selu (test_activation.TestSeluModule)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 760, in profile_selu
torch.nn.functional.selu(torch.ones(1, 128, 28, 28))
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
return _profiler_hook(result)
AssertionError
======================================================================
FAIL: profile_sigmoid (test_activation.TestSigmoidModule)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 326, in profile_sigmoid
torch.sigmoid(torch.ones(1, 128, 28, 28))
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
return _profiler_hook(result)
AssertionError
----------------------------------------------------------------------
Ran 22 tests in 833.476s
##oneflow.argmin接口assert error
问题描述:依照torch文档进行测试,assert error 测试代码:
a=torch.ones(4,4)
torch.argmin(a)
报错:
(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_argmin.TestArgmin.profile_argmin
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-15 16:12:50 135230:135230 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 16:12:50 135230:135230 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 16:12:50 135230:135230 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 16:12:51 135230:135230 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 16:12:51 135230:135230 output_membuf.h:71] Completed Stage: Post Processing
F
======================================================================
FAIL: profile_argmin (test_argmin.TestArgmin)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_argmin.py", line 107, in profile_argmin
torch.argmin(a)
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
return _profiler_hook(result)
AssertionError
----------------------------------------------------------------------
Ran 1 test in 6.248s
FAILED (failures=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 10.157s
OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
OP Args Lib KT(GPU) BW(GPU) KT(1 CPU) ET(1 CPU) KT(32 CPU) ET(32 CPU)
──────────────────────────────────────────────────────────────────────────────────────────────────
argmin randn(4, 4) PT 2.0 - 9.3 13.0 11.8 16.2
oneflow.nn.BatchNorm1d等 profile测试过程中模型类型变为Nonetype问题
问题描述:进行该系列接口性能测试时,会显示TypeError: 'NoneType' object is not callable,而在命令行中运行代码测试模型type输出正常结果
测试程序
@profile(torch.nn.BatchNorm2d)
def profile_BatchNorm2d(test_case):
m1 = torch.nn.BatchNorm2d(10)
m2 = torch.nn.BatchNorm2d(10, affine=False)
print(type(m1),type(m2))
input1 = torch.ones(2, 10, 8, 3)
input2 = torch.ones(2, 10, 8, 3)
out1=m1(input1)
out2=m2(input2)
报错信息
(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_batchnorm.TestBatchNormModule.profile_BatchNorm2d
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-15 17:48:23 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:26 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:26 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:26 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:26 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:26 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:27 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:27 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:28 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:28 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:28 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:28 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:28 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
<class 'NoneType'> <class 'NoneType'>
E
======================================================================
ERROR: profile_BatchNorm2d (test_batchnorm.TestBatchNormModule)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_batchnorm.py", line 83, in profile_BatchNorm2d
out1=m1(input1)
TypeError: 'NoneType' object is not callable
----------------------------------------------------------------------
Ran 1 test in 5.203s
FAILED (errors=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
<class 'NoneType'> <class 'NoneType'>
E
======================================================================
ERROR: profile_BatchNorm2d (test_batchnorm.TestBatchNormModule)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_batchnorm.py", line 83, in profile_BatchNorm2d
out1=m1(input1)
TypeError: 'NoneType' object is not callable
----------------------------------------------------------------------
Ran 1 test in 11.976s
FAILED (errors=1)
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
OP Args Lib KT(GPU) BW(GPU) KT(1 CPU) ET(1 CPU) KT(32 CPU) ET(32 CPU)
───────────────────────────────────────────────────────────────────────────────────────────────────────────────
nn.BatchNorm2d 10 OF - - 12.2 244.4 12.6 245.4
nn.BatchNorm2d 10, affine=False OF - - 9.1 191.8 9.2 191.0
nn.BatchNorm2d 10 PT - - 21.6 190.8 21.2 165.8
nn.BatchNorm2d 10, affine=False PT - - 10.9 69.7 16.1 212.4
oneflow.bernoulli profile测试过程中报错 Error Type: oneflow.ErrorProto.check_failed_error
问题描述:进行该系列接口性能测试时,会显示Error Type: oneflow.ErrorProto.check_failed_error
测试程序
@profile(torch.bernoulli)
def profile_bernoulli(test_case):
torch.bernoulli(torch.ones(3, 3))
torch.bernoulli(torch.zeros(3, 3))
报错信息
(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_bernoulli.TestBernoulli.profile_bernoulli
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
E
======================================================================
ERROR: profile_bernoulli (test_bernoulli.TestBernoulli)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_bernoulli.py", line 73, in profile_bernoulli
torch.bernoulli(torch.ones(3, 3))
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 208, in profiled_op
additional_description,
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 147, in run_flow
op(*args, **kwargs)
RuntimeError: Check failed: (device.type()) == ("cpu") (cuda vs cpu)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 143, in Dispatch<oneflow::one::Tensor>
Dispatch<TensorTuple>(op_expr, inputs, ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 134, in Dispatch<oneflow::one::TensorTuple>
Dispatch(op_expr, processor.inputs(), outputs.get(), ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
internal_->Apply(op_expr, inputs, outputs, ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 83, in NaiveInterpret
[&]() -> Maybe<const LocalTensorInferResult> { LocalTensorMetaInferArgs ... Data_YouAreNotAllowedToCallThisFuncOutsideThisFile(); }()
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 83, in operator()
user_op_expr.mut_local_tensor_infer_cache()->GetOrInfer(infer_args)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/local_tensor_infer_cache.cpp", line 199, in GetOrInfer
Infer(*user_op_expr, infer_args)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/local_tensor_infer_cache.cpp", line 158, in Infer
CheckIsDeviceSupportedByOp(*default_device, user_op_expr.op_type_name())
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/local_tensor_infer_cache.cpp", line 31, in CheckIsDeviceSupportedByOp
Error Type: oneflow.ErrorProto.check_failed_error
----------------------------------------------------------------------
Ran 1 test in 2.459s
FAILED (errors=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 7.622s
OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
OP Args Lib KT(GPU) BW(GPU) KT(1 CPU) ET(1 CPU) KT(32 CPU) ET(32 CPU)
─────────────────────────────────────────────────────────────────────────────────────────────────────
bernoulli ones(3, 3) PT 3.0 - 13.2 18.1 5.4 7.4
bernoulli zeros(3, 3) PT 3.0 - 5.3 7.2 5.1 6.9
oneflow.nn.BatchNorm1d等 profile测试过程中模型类型变为Nonetype问题
这个不是 functional 接口,autoprof 还不支持,此外我们还没有和 pytorch 对齐的 F.batch_norm,所以先忽略吧
oneflow.bernoulli profile测试过程中报错 Error Type: oneflow.ErrorProto.check_failed_error
这个是因为 flow.bernoulli 不支持 cuda tensor,已经汇报给相关同事了,也可以先跳过,我会增强一下 autoprof 的功能支持只测 cpu
test_cosine_similarity 报错AssertionError
最小实现代码
@profile(torch.nn.functional.cosine_similarity)
def profile_cosine_similarity(test_case):
input1 = torch.ones(100,128)
input2 = torch.ones(100,128)
torch.nn.functional.cosine_similarity(input1, input2)
torch.nn.functional.cosine_similarity(input1, input2, dim=0)
报错及输出情况
(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/te
st/modules# python3 -m oneflow.autoprof test_cosine_similarity.TestCosineSimilarity.profile_cosine_similarity
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-29 10:35:35 421357:421357 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-29 10:35:35 421357:421357 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-29 10:35:35 421357:421357 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-29 10:35:54 421357:421357 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-29 10:35:55 421357:421357 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-29 10:35:55 421357:421357 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-29 10:35:55 421357:421357 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-29 10:35:57 421357:421357 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-29 10:35:57 421357:421357 output_membuf.h:71] Completed Stage: Post Processing
F
======================================================================
FAIL: profile_cosine_similarity (test_cosine_similarity.TestCosineSimilarity)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
res = f(*args, **kwargs)
File "/workspace/oneflow/python/oneflow/test/modules/test_cosine_similarity.py", line 67, in profile_cosine_similarity
torch.nn.functional.cosine_similarity(input1, input2)
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
*args, **kwargs
File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
return _profiler_hook(result)
AssertionError
----------------------------------------------------------------------
Ran 1 test in 26.624s
FAILED (failures=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 23.765s
OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
KT(1 ET(1 KT(32 ET(32
OP Args Lib KT(GPU) BW(GPU) CPU) CPU) CPU) CPU)
────────────────────────────────────────────────────────────────────────────────────────────
nn.func… ones(10… PT 23.6 - 78.0 80.8 2180.6 2189.5
128),
ones(10…
128)
nn.func… ones(10… PT 29.2 - 76.8 80.8 1401.8 1410.6
128),
ones(10…
128),
dim=0