[Bug] [RISC-V RVV] Performance Issue: log operator slower on RVV

Open yanyanyanggg opened this issue 2 months ago • 0 comments

Issue: [RISC-V RVV] Performance Issue: log operator slower on RVV

Description

The log operator exhibits performance regression with the RISC‑V Vector (RVV) extension enabled. The acceleration ratio is 0.328, indicating the RVV implementation is approximately 3× slower than the scalar RV version. This is unexpected for an elementwise mathematical operation that should benefit from vectorization.

Steps to Reproduce

Generate the log operator with the following configuration:

params = {
    "dtype": "float32",
    "batch": 14,
    "channels": 23,
    "input_height": 67,
    "input_width": 99
}

Export the operator to two targets:

RV target (scalar, without vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c

RVV target (with vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c,+v

Run performance measurement on both targets.

Operator definition code:

def export_log(params, set_dir=None, platform="rv"):
    data = relay.var("data",
                     shape=(params["batch"], params["channels"],
                            params["input_height"], params["input_width"]),
                     dtype=params["dtype"])
    log_op = relay.log(data)
    export_op(log_op, params["op_name"], [data], params, set_dir=set_dir)

Performance Data

RV execution time: 13.393300 ms
RVV execution time: 40.848000 ms
Acceleration ratio (RV/RVV): 0.328 (RVV is ~3× slower)

Environment Information

TVM version: 0.19.0
LLVM version: [Please provide: llvm-config --version]
Hardware: Spacemit K1‑X bit‑brick board
CPU: Spacemit X60 (8 cores, 1.6 GHz)
ISA: rv64imafdcv (with vector extensions)
Memory: 7.6 GB
OS: Bianbu 2.2, Linux kernel 6.6.63
Input shape: (14, 23, 67, 99) ≈ 1.7M elements

Expected Behavior

RVV vectorization should provide a performance improvement over the scalar RV baseline for elementwise mathematical operations like log.

Additional Context

The log operation is applied elementwise to a tensor of ~1.7M elements.
This severe performance regression suggests inefficient vector code generation or suboptimal use of RVV instructions for mathematical functions.
Similar regressions are observed across multiple operators (sum, relu, bias_add, sqrt, etc.), indicating a potential systemic issue with RVV vectorization in TVM.

Dec 09 '25 04:12 yanyanyanggg