oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

为算子添加性能测试(profile)

Open laoliu97 opened this issue 3 years ago • 11 comments

任务

为了更直观的确定算子的工作效率,方便与pytorch进行对比,同时也能在调试新模型过程中精确、快速地定位比较耗时的算子,需要给oneflow/python/oneflow/test/modules目录下的算子增加性能测试。

  • 添加方法参考:https://github.com/Oneflow-Inc/OneTeam/blob/master/tutorial/howto_test_user_op.md#%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95

下方comments为oneflow/python/oneflow/test/modules目录下的程序表格,由于算子众多,故依照首字母分类和排序,更新中。。。 后期预计添加profile实现情况的自动统计

laoliu97 avatar Aug 09 '22 02:08 laoliu97

代码 执行人 是否需要增加profile /是否存在异常 pr
test_abs.py 刘轩 https://github.com/Oneflow-Inc/oneflow/pull/8889
test_activation.py 刘轩 同上
test_adaptive.py 刘轩 同上
test_adaptive_pool.py 刘轩 同上
test_addcdiv.py 刘轩 同上
test_addcmul.py 刘轩 同上
test_addmm.py 刘轩 同上
test_affine_grid.py 刘轩 同上
test_allreduce.py 刘轩
test_amax.py 刘轩 同上
test_amin.py 刘轩 同上
test_arange.py 刘轩 同上
test_argmax.py 刘轩 同上
test_argmin.py 刘轩 assert error
test_argsort.py 刘轩 同上
test_argwhere.py 刘轩
test_as_stride.py 刘轩
test_as_tensor.py 刘轩
test_autograd_function.py 刘轩
test_autograd_mode.py 刘轩
test_autograd.py 刘轩
test_avgpool.py 刘轩
test_batch_gather.py 刘轩
test_batchnorm_add_relu.py 刘轩
test_batchnorm.py 刘轩
test_bernoulli.py 刘轩 异常报错
test_bmm.py 刘轩
test_broadcast_like.py 刘轩
test_broadcast_ops.py 刘轩
test_cast.py 刘轩
test_ceil.py 刘轩
test_check_meta_consistency.py 刘轩
test_chunk.py 刘轩
test_clamp.py 刘轩
test_clip_grad.py 刘轩
test_coco_reader.py 刘轩
test_coin_flip.py 刘轩
test_comb2to2d.py 刘轩
test_combined_margin_loss.py 刘轩
test_comm_ops.py 刘轩
test_comm.py 刘轩
test_concat.py 刘轩
test_consistent_broadcast_matmul.py 刘轩
test_consistent_inv.py 刘轩
test_consistent_vector_matrix_product.py 刘轩
test_constant_pad.py 刘轩
test_constant.py 刘轩
test_contiguous.py 刘轩
test_conv1d.py 刘轩
test_conv2d.py 刘轩
test_copy.py 刘轩
test_cosine_similarity.py 刘轩
test_ctc_greedy_decoder.py 刘轩
test_ctc_loss.py 刘轩
test_cublas_fused_mlp.py 刘轩
test_cum_ops.py 刘轩

laoliu97 avatar Aug 09 '22 02:08 laoliu97

代码 执行人 是否需要增加profile pr test_abs.py 刘轩 test_activation.py 刘轩 test_adaptive.py 刘轩 test_adaptive_pool.py 刘轩 test_addcdiv.py 刘轩 test_addcmul.py 刘轩 test_addmm.py 刘轩 test_affine_grid.py 刘轩 test_allreduce.py 刘轩 test_amax.py 刘轩 test_amin.py 刘轩 test_arange.py 刘轩 test_argmax.py 刘轩 test_argmin.py 刘轩 test_argsort.py 刘轩 test_argwhere.py 刘轩 test_as_stride.py 刘轩 test_as_tensor.py 刘轩 test_autograd_function.py 刘轩 test_autograd_mode.py 刘轩 test_autograd.py 刘轩 test_avgpool.py 刘轩

为test_activate.py 文件中的 class TestGelu 添加profile测试时出现报错, 程序如下:

@profile(torch.nn.GELU)
    def profile_gelu(test_case):
        torch.nn.GELU(torch.ones(1, 128, 28, 28))
        torch.nn.GELU(torch.ones(16, 128, 28, 28))

报错如下:

======================================================================
ERROR: profile_gelu (test_activation.TestGelu)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 275, in profile_gelu
    torch.nn.GELU(torch.ones(1, 128, 28, 28))
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 223, in profiled_op
    additional_description,
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 96, in run_torch
    op(*args, **kwargs)
TypeError: __init__() takes 1 positional argument but 2 were given

----------------------------------------------------------------------
Ran 1 test in 3.858s

FAILED (errors=1)
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
                                                                                                               
  OP        Args                    Lib   KT(GPU)   BW(GPU)   KT(1 CPU)   ET(1 CPU)   KT(32 CPU)   ET(32 CPU)  
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  nn.GELU   ones(1, 128, 28, 28)    OF    -         -         -           14.6        -            14.6        
  nn.GELU   ones(16, 128, 28, 28)   OF    -         -         -           119.9       -            17.2   

laoliu97 avatar Aug 09 '22 04:08 laoliu97

现在只支持 functional api,要改成用 torch.nn.functional.gelu,可以参考已有的 profile_relu

daquexian avatar Aug 09 '22 04:08 daquexian

问题反馈

问题描述:对于test_abs.py文件中的flow.abs( )进行测试时,出现报错,如下所示,无法输出torch.abs(torch.ones(16, 128, 28, 28))操作的结果。

(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_abs.py
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:19 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:19 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:20 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:21 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:21 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:01:21 89274:89274 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:01:21 89274:89274 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:01:21 89274:89274 output_membuf.h:71] Completed Stage: Post Processing
F
======================================================================
FAIL: profile_abs (test_abs.TestAbsModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_abs.py", line 43, in profile_abs
    torch.abs(torch.ones(16, 128, 28, 28))
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
    return _profiler_hook(result)
AssertionError

----------------------------------------------------------------------
Ran 1 test in 6.668s

FAILED (failures=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 17.485s

OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
                                                                                                           
  OP    Args                    Lib   KT(GPU)   BW(GPU)   KT(1 CPU)   ET(1 CPU)   KT(32 CPU)   ET(32 CPU)  
 ───────────────────────────────────────────────────────────────────────────────────────────────────────── 
  abs   ones(1, 128, 28, 28)    OF    6.1       -         60.2        61.7        52.4         53.8        
  abs   ones(1, 128, 28, 28)    PT    1.4       -         20.8        25.2        597.7        607.6       
  abs   ones(16, 128, 28, 28)   PT    16.3      -         628.4       633.9       603.8        696.1       
                                                                                                           

但再运行几次后报错消失,可以完整输出测试结果,如下所示

(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_abs.py
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:06 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:06 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:07 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:07 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-10 15:03:07 90098:90098 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-10 15:03:08 90098:90098 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-10 15:03:08 90098:90098 output_membuf.h:71] Completed Stage: Post Processing
.
----------------------------------------------------------------------
Ran 1 test in 6.756s

OK
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 18.993s

OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
                                                                                                           
  OP    Args                    Lib   KT(GPU)   BW(GPU)   KT(1 CPU)   ET(1 CPU)   KT(32 CPU)   ET(32 CPU)  
 ───────────────────────────────────────────────────────────────────────────────────────────────────────── 
  abs   ones(1, 128, 28, 28)    OF    6.1       -         61.6        65.1        52.7         56.0        
  abs   ones(16, 128, 28, 28)   OF    37.9      -         855.6       860.3       848.0        852.7       
  abs   ones(1, 128, 28, 28)    PT    1.4       -         28.1        32.1        303.1        311.2       
  abs   ones(16, 128, 28, 28)   PT    16.4      -         604.2       609.6       1291.2       1300.2      

laoliu97 avatar Aug 10 '22 07:08 laoliu97

在测试logsigmoid、selu 、sigmoid 时出现如下AssertionError

======================================================================
FAIL: profile_logsigmoid (test_activation.TestLogSigmoidModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 500, in profile_logsigmoid
    torch.nn.functional.logsigmoid(torch.ones(1, 128, 28, 28))
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
    return _profiler_hook(result)
AssertionError

======================================================================
FAIL: profile_selu (test_activation.TestSeluModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 760, in profile_selu
    torch.nn.functional.selu(torch.ones(1, 128, 28, 28))
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
    return _profiler_hook(result)
AssertionError

======================================================================
FAIL: profile_sigmoid (test_activation.TestSigmoidModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_activation.py", line 326, in profile_sigmoid
    torch.sigmoid(torch.ones(1, 128, 28, 28))
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
    return _profiler_hook(result)
AssertionError

----------------------------------------------------------------------
Ran 22 tests in 833.476s

laoliu97 avatar Aug 10 '22 10:08 laoliu97

##oneflow.argmin接口assert error

问题描述:依照torch文档进行测试,assert error 测试代码:

a=torch.ones(4,4)
torch.argmin(a)

报错:

(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_argmin.TestArgmin.profile_argmin
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-15 16:12:50 135230:135230 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 16:12:50 135230:135230 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 16:12:50 135230:135230 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 16:12:51 135230:135230 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 16:12:51 135230:135230 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 16:12:51 135230:135230 output_membuf.h:71] Completed Stage: Post Processing
F
======================================================================
FAIL: profile_argmin (test_argmin.TestArgmin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_argmin.py", line 107, in profile_argmin
    torch.argmin(a)
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
    return _profiler_hook(result)
AssertionError

----------------------------------------------------------------------
Ran 1 test in 6.248s

FAILED (failures=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 10.157s

OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
                                                                                                    
  OP       Args          Lib   KT(GPU)   BW(GPU)   KT(1 CPU)   ET(1 CPU)   KT(32 CPU)   ET(32 CPU)  
 ────────────────────────────────────────────────────────────────────────────────────────────────── 
  argmin   randn(4, 4)   PT    2.0       -         9.3         13.0        11.8         16.2        

laoliu97 avatar Aug 15 '22 08:08 laoliu97

oneflow.nn.BatchNorm1d等 profile测试过程中模型类型变为Nonetype问题

问题描述:进行该系列接口性能测试时,会显示TypeError: 'NoneType' object is not callable,而在命令行中运行代码测试模型type输出正常结果

测试程序


    @profile(torch.nn.BatchNorm2d) 
    def profile_BatchNorm2d(test_case):
        m1 = torch.nn.BatchNorm2d(10)
        m2 = torch.nn.BatchNorm2d(10, affine=False)
        print(type(m1),type(m2))
        input1 = torch.ones(2, 10, 8, 3)
        input2 = torch.ones(2, 10, 8, 3)
        out1=m1(input1)
        out2=m2(input2)

报错信息

(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_batchnorm.TestBatchNormModule.profile_BatchNorm2d
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-15 17:48:23 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:26 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:26 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:26 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:26 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:26 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:27 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:27 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:27 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:28 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:28 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-15 17:48:28 150614:150614 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-15 17:48:28 150614:150614 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-15 17:48:28 150614:150614 output_membuf.h:71] Completed Stage: Post Processing
<class 'NoneType'> <class 'NoneType'>
E
======================================================================
ERROR: profile_BatchNorm2d (test_batchnorm.TestBatchNormModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_batchnorm.py", line 83, in profile_BatchNorm2d
    out1=m1(input1)
TypeError: 'NoneType' object is not callable

----------------------------------------------------------------------
Ran 1 test in 5.203s

FAILED (errors=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
<class 'NoneType'> <class 'NoneType'>
E
======================================================================
ERROR: profile_BatchNorm2d (test_batchnorm.TestBatchNormModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_batchnorm.py", line 83, in profile_BatchNorm2d
    out1=m1(input1)
TypeError: 'NoneType' object is not callable

----------------------------------------------------------------------
Ran 1 test in 11.976s

FAILED (errors=1)
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
                                                                                                                 
  OP               Args               Lib   KT(GPU)   BW(GPU)   KT(1 CPU)   ET(1 CPU)   KT(32 CPU)   ET(32 CPU)  
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  nn.BatchNorm2d   10                 OF    -         -         12.2        244.4       12.6         245.4       
  nn.BatchNorm2d   10, affine=False   OF    -         -         9.1         191.8       9.2          191.0       
  nn.BatchNorm2d   10                 PT    -         -         21.6        190.8       21.2         165.8       
  nn.BatchNorm2d   10, affine=False   PT    -         -         10.9        69.7        16.1         212.4       

laoliu97 avatar Aug 15 '22 09:08 laoliu97

oneflow.bernoulli profile测试过程中报错 Error Type: oneflow.ErrorProto.check_failed_error

问题描述:进行该系列接口性能测试时,会显示Error Type: oneflow.ErrorProto.check_failed_error

测试程序

    @profile(torch.bernoulli) 
    def profile_bernoulli(test_case):
        torch.bernoulli(torch.ones(3, 3))
        torch.bernoulli(torch.zeros(3, 3))

报错信息

(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/test/modules# python3 -m oneflow.autoprof test_bernoulli.TestBernoulli.profile_bernoulli
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
E
======================================================================
ERROR: profile_bernoulli (test_bernoulli.TestBernoulli)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_bernoulli.py", line 73, in profile_bernoulli
    torch.bernoulli(torch.ones(3, 3))
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 208, in profiled_op
    additional_description,
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 147, in run_flow
    op(*args, **kwargs)
RuntimeError: Check failed: (device.type()) == ("cpu") (cuda vs cpu) 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 143, in Dispatch<oneflow::one::Tensor>
    Dispatch<TensorTuple>(op_expr, inputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 134, in Dispatch<oneflow::one::TensorTuple>
    Dispatch(op_expr, processor.inputs(), outputs.get(), ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
    internal_->Apply(op_expr, inputs, outputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 83, in NaiveInterpret
    [&]() -> Maybe<const LocalTensorInferResult> { LocalTensorMetaInferArgs ... Data_YouAreNotAllowedToCallThisFuncOutsideThisFile(); }()
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 83, in operator()
    user_op_expr.mut_local_tensor_infer_cache()->GetOrInfer(infer_args)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/local_tensor_infer_cache.cpp", line 199, in GetOrInfer
    Infer(*user_op_expr, infer_args)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/local_tensor_infer_cache.cpp", line 158, in Infer
    CheckIsDeviceSupportedByOp(*default_device, user_op_expr.op_type_name())
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/local_tensor_infer_cache.cpp", line 31, in CheckIsDeviceSupportedByOp
    
Error Type: oneflow.ErrorProto.check_failed_error

----------------------------------------------------------------------
Ran 1 test in 2.459s

FAILED (errors=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 7.622s

OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
                                                                                                       
  OP          Args          Lib   KT(GPU)   BW(GPU)   KT(1 CPU)   ET(1 CPU)   KT(32 CPU)   ET(32 CPU)  
 ───────────────────────────────────────────────────────────────────────────────────────────────────── 
  bernoulli   ones(3, 3)    PT    3.0       -         13.2        18.1        5.4          7.4         
  bernoulli   zeros(3, 3)   PT    3.0       -         5.3         7.2         5.1          6.9         
                                                                                                       

laoliu97 avatar Aug 15 '22 10:08 laoliu97

oneflow.nn.BatchNorm1d等 profile测试过程中模型类型变为Nonetype问题

这个不是 functional 接口,autoprof 还不支持,此外我们还没有和 pytorch 对齐的 F.batch_norm,所以先忽略吧

daquexian avatar Aug 15 '22 11:08 daquexian

oneflow.bernoulli profile测试过程中报错 Error Type: oneflow.ErrorProto.check_failed_error

这个是因为 flow.bernoulli 不支持 cuda tensor,已经汇报给相关同事了,也可以先跳过,我会增强一下 autoprof 的功能支持只测 cpu

daquexian avatar Aug 16 '22 02:08 daquexian

test_cosine_similarity 报错AssertionError

最小实现代码

@profile(torch.nn.functional.cosine_similarity)
    def profile_cosine_similarity(test_case):
        input1 = torch.ones(100,128)
        input2 = torch.ones(100,128)
        torch.nn.functional.cosine_similarity(input1, input2)
        torch.nn.functional.cosine_similarity(input1, input2, dim=0) 

报错及输出情况

(base) root@training-webide-b81f61-b81f61-webide-master-0:/workspace/oneflow/python/oneflow/te
st/modules# python3 -m oneflow.autoprof test_cosine_similarity.TestCosineSimilarity.profile_cosine_similarity
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
STAGE:2022-08-29 10:35:35 421357:421357 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-29 10:35:35 421357:421357 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-29 10:35:35 421357:421357 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-29 10:35:54 421357:421357 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-29 10:35:55 421357:421357 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-29 10:35:55 421357:421357 output_membuf.h:71] Completed Stage: Post Processing
STAGE:2022-08-29 10:35:55 421357:421357 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-08-29 10:35:57 421357:421357 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-08-29 10:35:57 421357:421357 output_membuf.h:71] Completed Stage: Post Processing
F
======================================================================
FAIL: profile_cosine_similarity (test_cosine_similarity.TestCosineSimilarity)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 253, in new_f
    res = f(*args, **kwargs)
  File "/workspace/oneflow/python/oneflow/test/modules/test_cosine_similarity.py", line 67, in profile_cosine_similarity
    torch.nn.functional.cosine_similarity(input1, input2)
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 760, in method
    *args, **kwargs
  File "/usr/local/miniconda3/lib/python3.7/site-packages/oneflow/test_utils/automated_test_util/profiler.py", line 228, in profiled_op
    return _profiler_hook(result)
AssertionError

----------------------------------------------------------------------
Ran 1 test in 26.624s

FAILED (failures=1)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
.
----------------------------------------------------------------------
Ran 1 test in 23.765s

OK
----------------------------------------------------------------------
Summary ("KT" means "Kernel Time", "ET" means "End-to-end Time", in microseconds; "BW" means "Bandwidth" in GB/s):
                                                                                              
                                                  KT(1       ET(1       KT(32       ET(32     
  OP         Args       Lib   KT(GPU)   BW(GPU)   CPU)       CPU)       CPU)        CPU)      
 ──────────────────────────────────────────────────────────────────────────────────────────── 
  nn.func…   ones(10…   PT    23.6      -         78.0       80.8       2180.6      2189.5    
             128),                                                                            
             ones(10…                                                                         
             128)                                                                             
  nn.func…   ones(10…   PT    29.2      -         76.8       80.8       1401.8      1410.6    
             128),                                                                            
             ones(10…                                                                         
             128),                                                                            
             dim=0       

laoliu97 avatar Aug 29 '22 02:08 laoliu97