Benoit Jacob
Benoit Jacob
I'm using Tracy with sampling on Android/aarch64. Some of my symbols, that are created by a custom LLVM-based compiler (IREE), are at first not visible to TracySourceView: trying to open...
Not a real pull request. Just to illustrate issue #384.
I have run into the assertion being removed here. Apparently it was assuming that a `float` value was printing as a specific number of characters and that assumption was defeated...
I'm running the Tracy UI on Linux, but remotely: my local machine is a Mac and I'm using a remote desktop solution to use my Linux machine. At least in...
to use caching of weights and use the same ordering of the matmul as in other xnnpack benchmarks. I can't test this easily for now but once the Ruy CMakeLists.txt...
This is currently blocked by running into this problem: Issue #9903
# Preamble ARM NEON has fixed-point multiplication instructions, like `sqdmulh`. Existing NN inference solutions (TFLite, ruy, XNNPACK) use them. It's not possible to match their performance in quantized workloads without...
ARM NEON has pairwise-folding addition instructions where pairs of narrow (e.g. 8-bit) input lanes are added together and accumulated into wider (e.g. 16-bit) integer lanes. For example SADALP, SADDLP. This...
This is open-ended. The problem is that many key use cases, such as matrix multiplication kernels, need to know a number of SIMD vector registers that they can count on...
Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This...