tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Bug] [RISC-V RVV] avg_pool2d operator shows performance degradation

Open yanyanyanggg opened this issue 2 months ago • 0 comments

Issue: [RISC-V RVV] avg_pool2d operator shows performance degradation

Description

The average pooling operator (avg_pool2d) shows performance regression with the RISC‑V Vector (RVV) extension, achieving only 0.621× the performance of the scalar implementation. This suggests suboptimal vectorization for 2D pooling operations.

Steps to Reproduce

  1. Generate the avg_pool2d operator with the following configuration:
params = {
    "dtype": "float32",
    "batch": 14,
    "pool_channels": 23,
    "pool_size": 2,
    "stride": 4,
    "padding": 1,
    "input_height": 99,
    "input_width": 95
}
  1. Export the operator to two targets:

    • RV target (scalar, without vector extension):
      llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c
      
    • RVV target (with vector extension):
      llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c,+v
      
  2. Run performance measurement on both targets.

Operator definition code:

def export_avg_pool2d(params, set_dir=None, platform="rv"):
    data = relay.var("data",
                     shape=(params["batch"], params["pool_channels"],
                            params["input_height"], params["input_width"]),
                     dtype=params["dtype"])
    pool = relay.nn.avg_pool2d(
        data,
        pool_size=(params["pool_size"], params["pool_size"]),
        strides=(params["stride"], params["stride"]),
        padding=(params["padding"], params["padding"])
    )
    export_op(pool, params["op_name"], [data], params, set_dir=set_dir)

Performance Data

  • RV execution time: 8.779250 ms
  • RVV execution time: 14.134500 ms
  • Acceleration ratio (RV/RVV): 0.621 (RVV is ~1.6× slower)

Environment Information

  • TVM version: 0.19.0
  • LLVM version: [Please provide: llvm-config --version]
  • Hardware: Spacemit K1‑X bit‑brick board
  • CPU: Spacemit X60 (8 cores, 1.6 GHz)
  • ISA: rv64imafdcv (with vector extensions)
  • Memory: 7.6 GB
  • OS: Bianbu 2.2, Linux kernel 6.6.63
  • Operation: 2×2 average pooling with stride 4 on input shape (14, 23, 99, 95)

Expected Behavior

RVV vectorization should provide a performance improvement over the scalar RV baseline for 2D pooling operations like avg_pool2d.

Additional Context

  • The operation performs 2×2 average pooling with stride 4 and padding 1 on a 4D tensor.
  • The performance regression indicates that the vectorized implementation of 2D pooling may have inefficient memory access patterns or suboptimal use of vector instructions for reduction within pooling windows.
  • This is part of a broader pattern where multiple operators show performance degradation with RVV, suggesting potential issues with vectorization strategies for 2D operations.

yanyanyanggg avatar Dec 09 '25 04:12 yanyanyanggg