xFasterTransformer issues

[Layers] Increased the threshold for enabling flashAttn

performance

Qwen2.5-0.5B-Instruct quantization with gptq error

1

xft version：1.8.2 lscpu： Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel...

wcollin

Bump gradio from 4.37.2 to 5.0.0 in /examples/web_demo

Bumps [gradio](https://github.com/gradio-app/gradio) from 4.37.2 to 5.0.0. Release notes Sourced from gradio's releases. [email protected] Features #8843 6f95286 - No token passed by default in gr.load() #8843 6f95286 - Adding new themes...

dependabot[bot]

dependencies

undefined reference to Qwen2LLM

6

when I build the source code on ubuntu 22.04, I have a issue as below, how do I fix it ? BTW, I used main branch. HEAD is e73e4c1ac03f44fe986f34c01bb345e8bc5409b4 ```...

Jasonjunsu

Which instruction set will the CPU use?

I want to know that when I use xFT to test the qwen3-8B model, the dtype is bf16 and the kv cache is set to fp16. I would like to...

Light-Travlling

1st token latency has poor performance than other framework

12

when i test the model "DeepSeek-R1-Distill-Qwen-7B", the TTFT metrix worse than openvino，I don't know if it's normal. If so, is there any way to improve this performance ![Image](https://github.com/user-attachments/assets/68eb8241-5439-48bf-9659-2d4189f0fe5b) Environment： CPU：2x8592+...

Light-Travlling

why benchmark choose create multi process to test performance metrix?

1

I have 2 8592+ EMR CPU，and have four node in my system, when runing run_benchmark.sh script with "-s" parameter, the program justice to create four process as follows: ![Image](https://github.com/user-attachments/assets/77e8e18d-be23-4dc3-ba2c-d86d48959d16) the...

Light-Travlling

how benchmark compute the multi process proformance?

1. I test only use one node 0 to test qwen3-8B "numactl --all -C 0-31 -m 0 python /home/tzk/AI_Test/xFasterTransformer/benchmark/benchmark.py --model_name qwen3-4B --token_path /data/Model_File/Qwen3-4B --model_path /data/Model_File/Qwen3-4B-xft --prompt_path /home/tzk/AI_Test/xFasterTransformer/benchmark/prompt.json --batch_size 2 --iteration...

Light-Travlling

build: update pyproject.toml cmake<4.0

Build fails. ``` $ pip install . -v Collecting cmake Using cached https://mirror.nju.edu.cn/pypi/web/packages/91/96/2671d7f3612c4449affc956542b25d9193efd8026dbc8ab6b3498f5cede3/cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.9 MB) ... CMake Error at CMakeLists.txt:15 (cmake_minimum_required): Compatibility with CMake < 3.5 has been removed from...

caterpillar-1

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16

When running vllm serving with 16 threads using the model DeepSeek-Distill-Qwen-7b, the result is wrong with the prompt below. xfastertransformer 1.8.2. vllm-xft 0.5.5.0 The result is correct while running 12...

shanzhou2186

xFasterTransformer
xFasterTransformer copied to clipboard

Metadata

[Layers] Increased the threshold for enabling flashAttn

Qwen2.5-0.5B-Instruct quantization with gptq error

Bump gradio from 4.37.2 to 5.0.0 in /examples/web_demo

undefined reference to Qwen2LLM

Which instruction set will the CPU use?

1st token latency has poor performance than other framework

why benchmark choose create multi process to test performance metrix?

how benchmark compute the multi process proformance?

build: update pyproject.toml cmake<4.0

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16

← Metadata

Owner

Metadata

xFasterTransformer xFasterTransformer copied to clipboard

Metadata

← Metadata

Owner

Metadata

xFasterTransformer
xFasterTransformer copied to clipboard