[Bug] [RISC-V RVV] Performance Issue: bias_add operator slower with vectorization

Open yanyanyanggg opened this issue 2 months ago • 0 comments

Issue: [RISC-V RVV] Performance Issue: bias_add operator slower with vectorization

Description

The bias_add operator shows significant performance degradation when using the RISC‑V Vector (RVV) extension. With an acceleration ratio of 0.360, the RVV implementation is nearly 3× slower than the scalar implementation. This is unexpected for a channel‑wise addition operation that should benefit from vectorization.

Steps to Reproduce

Generate the bias_add operator with the following configuration:

params = {
    "dtype": "float32",
    "batch": 14,
    "channels": 23,
    "input_height": 67,
    "input_width": 99
}

Export the operator to two targets:

RV target (scalar, without vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c

RVV target (with vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c,+v

Run performance measurement on both targets.

Operator definition code:

def export_bias_add(params, set_dir=None, platform="rv"):
    data = relay.var("data",
                     shape=(params["batch"], params["channels"],
                            params["input_height"], params["input_width"]),
                     dtype=params["dtype"])
    bias = relay.var("bias", shape=(params["channels"],), dtype=params["dtype"])
    bias_add = relay.nn.bias_add(data, bias)
    export_op(bias_add, params["op_name"], [data, bias], params, set_dir=set_dir)

Performance Data

RV execution time: 7.683920 ms
RVV execution time: 21.363800 ms
Acceleration ratio (RV/RVV): 0.360 (RVV is ~2.8× slower)

Environment Information

TVM version: 0.19.0
LLVM version: [Please provide: llvm-config --version]
Hardware: Spacemit K1‑X bit‑brick board
CPU: Spacemit X60 (8 cores, 1.6 GHz)
ISA: rv64imafdcv (with vector extensions)
Memory: 7.6 GB
OS: Bianbu 2.2, Linux kernel 6.6.63
Operation: Channel‑wise bias addition on a tensor of shape (14, 23, 67, 99)

Expected Behavior

RVV vectorization should provide a performance improvement over the scalar RV baseline for broadcast addition operations like bias_add.

Additional Context

The bias_add operation adds a 1D bias vector to each channel of a 4D tensor (≈1.7M elements total).
The performance regression is severe and similar to other operators (sum, log, relu, etc.).
This suggests that the current RVV vectorization for broadcast operations may be suboptimal, or there are inefficiencies in memory access patterns or instruction selection.

Dec 09 '25 04:12 yanyanyanggg