tvm issues

[Bug] an ONNX model after compiling with TVM is inconsistent with ONNXRuntime

### Expected behavior When feed the same input into the same model, one in TVM, another in ONNXRuntime, we supposed that their results are the same. ### Actual behavior TVM's...

luyaor

type: bug

[Dlight] Scheduling Low batch GEMM using GEMV-like rule

2

1. add a dlight rule LowBatchGEMV to schedule low-batch GEMM just like GEMV. 2. fix some issues when lowering low-batch GEMM

jinhongyii

[Doc] Fixed Docstring usage example in `tvm.ir.make_node`

The provided usage example for `tvm.ir.make_node` has become outdated. Creating an IR node of type `IntImm` requires the field `span`. Compare: [`make_node` Unit Test](https://github.com/apache/tvm/blob/main/tests/python/ir/test_node_reflection.py#L73)

felix-ro

[KVCache] Support passing in attn_score_scaling_factor into KV cache

3

In GPT-2, attention calculation requires an additional feature `scale_attn_by_inverse_layer_idx`. It provides a scaling factor per attention layer when calculating the attention score, before applying the softmax function. This PR supports...

rickzx

[AOT][Testing] Print output values on test failure

This commit enhances the AOT test harness to print the "actual" and "reference" values when there is a mismatch. This helps when debugging a failing test. Sample output: ``` Actual,...

lhutton1

[SVE] Change the dtype of Ramp and Broadcast lanes to PrimExpr

4

This change will allow us to express scalable vectors through Ramp and Broadcast nodes, e.g. ``` vec = tvm.tir.expr.Ramp(0, 1, 4 * tvm.tir.vscale()) ``` We will use negative values for...

ekalda