tvm
tvm copied to clipboard
Open deep learning compiler stack for cpu, gpu and specialized accelerators
> [Last release v0.17.0](https://github.com/apache/tvm/issues/17122) was proposed at the end of July. and the release day is 25 July, more detail refer [v0.17.0 release schedule](https://github.com/apache/tvm/issues/17122). It has been almost **three months**...
# Introduction The TVM community has worked since the last release to deliver the following new exciting improvements! The main tags are below (**bold text is with lots of progress**):...
The pass `LowerThreadAllreduce` enables efficient block reduction. However, block reduction often requires a large amount of shared memory space. The current implementation of `LowerThreadAllreduce` only enable static shared memory reduce...
support [torch.index_fill_](https://pytorch.org/docs/stable/generated/torch.Tensor.index_fill.html)
Hello, I am currently using auto_scheduler to automatically tune a naive gemm operator. However, after the tuning is completed, I checked the corresponding assembly code and found that the registers...
I encountered a segmentation fault when applying the `PartitionTransformParams` pass to a Relax IR module that performs tensor concatenation and transposition operations. The segmentation fault occurs during the execution of...
[MetaSchedule]Fix the bug when loading database_tuning_record.json if there is pad_einsum primitive
When loading from database_tuning_record.json in Meta Schedule (this line: `B_reindex_pad_shared_dyn[v0, v1] = T.if_then_else(v0 < 1, B[v1, v0], T.float16(0)))`, the parameter dtype of the primitive pad_einsum is read as int64, causing...
TVM is built with USE_MRVL=ON and TVM Compiler is invoked with a default (LLVM) target alone. Command line processor emits the below error "Error: Passed --target-mrvl-accelerator_config but did not specify...
As discussed in #17439, The phase of ThreadSync injection should be applied when the memory allocations are all deterministic.
Lead to Suboptimal Shared Memory Reuse. pr #9341 introduced liveness analysis to merge the shared memory allocations , places touched buffer records at the outermost scope (e.g., outer loops) rather...