Wuwei Lin issues

Results 9 issues of


                                            Wuwei Lin

[TIR] Add pass ManifestSharedMemoryLocalStage

Added a pass to insert local (cache) stage for the shared memory. It's similar to cache read but bypasses the limitation of int set analysis for compacting buffer region by...

[Tracking Issue] Introducing DeclBuffer

### This issue is to track progress for the [RFC Introducing DeclBuffer](https://github.com/apache/tvm-rfcs/pull/70) - [ ] Introduce DeclBuffer data structure, add corresponding visitors in IR functors. #12300 - [ ] Update...

type:rfc-tracking

Use custom exception types

Currently, we use three macros in [`ASSERT`](https://github.com/shogun-toolbox/shogun/blob/fa5a9b683e980d0a9b637b4ffbaca59d5917cf20/src/shogun/io/SGIO.h#L190), [`REQUIRE`](https://github.com/shogun-toolbox/shogun/blob/fa5a9b683e980d0a9b637b4ffbaca59d5917cf20/src/shogun/io/SGIO.h#L195) and [`SG_ERROR`](https://github.com/shogun-toolbox/shogun/blob/fa5a9b683e980d0a9b637b4ffbaca59d5917cf20/src/shogun/io/SGIO.h#L131). `ASSERT` and `REQUIRE` are [assertions](https://github.com/shogun-toolbox/shogun/wiki/Assertions). `SG_ERROR` is used to throw an exception with some message. All of them throw `ShogunException`....

good first issue

Tag: Cleanup

[TOPI] Add layer norm operator

This PR added a tuple-sum based implementation of layer norm. It performs one-pass reduction to compute mean and variance at the same time. Reducer pattern is also added to allow...

[DISCUSS] Layout transformation in TIR graph

As we start to work on specific hardware, many operators would expect a specific kind of layout for both data and weight. Logically the layout start with simple ones. This...

[Codegen, CUDA] Enable emitting SyncWarp

Fixed sync warp being incorrectly treated no-op in cude codegen cc @tqchen

[Contrib] Implement NDArray cache update

This allows existing files to be updated. cc @tqchen

[Codegen, CUDA] Enable emitting SyncWarp

@tqchen

[adapter] Add CoreBench adapter

Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/22