Wuwei Lin
Wuwei Lin
Added a pass to insert local (cache) stage for the shared memory. It's similar to cache read but bypasses the limitation of int set analysis for compacting buffer region by...
### This issue is to track progress for the [RFC Introducing DeclBuffer](https://github.com/apache/tvm-rfcs/pull/70) - [ ] Introduce DeclBuffer data structure, add corresponding visitors in IR functors. #12300 - [ ] Update...
Currently, we use three macros in [`ASSERT`](https://github.com/shogun-toolbox/shogun/blob/fa5a9b683e980d0a9b637b4ffbaca59d5917cf20/src/shogun/io/SGIO.h#L190), [`REQUIRE`](https://github.com/shogun-toolbox/shogun/blob/fa5a9b683e980d0a9b637b4ffbaca59d5917cf20/src/shogun/io/SGIO.h#L195) and [`SG_ERROR`](https://github.com/shogun-toolbox/shogun/blob/fa5a9b683e980d0a9b637b4ffbaca59d5917cf20/src/shogun/io/SGIO.h#L131). `ASSERT` and `REQUIRE` are [assertions](https://github.com/shogun-toolbox/shogun/wiki/Assertions). `SG_ERROR` is used to throw an exception with some message. All of them throw `ShogunException`....
This PR added a tuple-sum based implementation of layer norm. It performs one-pass reduction to compute mean and variance at the same time. Reducer pattern is also added to allow...
As we start to work on specific hardware, many operators would expect a specific kind of layout for both data and weight. Logically the layout start with simple ones. This...
Fixed sync warp being incorrectly treated no-op in cude codegen cc @tqchen
This allows existing files to be updated. cc @tqchen
@tqchen
Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/22