oleotiger issues

Results 16 issues of


                                            oleotiger

Is hybrid call stack supported?

**Is your feature request related to a problem? Please describe.** I want to catch the call stack from pytorch to C++ backend. For example, which function is called in C++...

Performance data mistakes in LLAMA inference

[Inference Performance of LLAMA-2 posted by Nvidia 1](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/llama.html) According to the link above, the inference lantecy of LLAMA-2-13B with A100 80GB SXM4 at batch size=1 and tp=1, is less than...

bug

GEMM API for efficient LLM inference with W8A16

I want to perform inference on quantized LLAMA (W8A16) on ARM-v9 (with SVE) using oneDNN. The LLAMA weights are per-group quantized. Based on my understanding, I need to prepack the...

enhancement

help wanted

platform:cpu-aarch64

oleotiger

Is hybrid call stack supported?

Performance data mistakes in LLAMA inference

GEMM API for efficient LLM inference with W8A16

硬件配置最小要求？

Why JDK7?

What is the max data size which can be processed by netdata-timescale-relay?