Faster compilation times
Thank you for the awesome project! I am interested in doing some experimentation using this kernel as a base, however the compilation times are quite long. What temporary changes can be made to the codebase to speed it up, selecting only a particular set of kernel template arguments?
I attempted to modify https://github.com/flashinfer-ai/flashinfer/blob/main/python/setup.py#L51-L58 and comment out branches in https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/utils.cuh, unfortunately still got "symbol not found" errors upon trying to load the C++ extension. Any help would be welcome. Thanks again.
Thank you for the awesome project! I am interested in doing some experimentation using this kernel as a base, however the compilation times are quite long. What temporary changes can be made to the codebase to speed it up, selecting only a particular set of kernel template arguments?
I attempted to modify https://github.com/flashinfer-ai/flashinfer/blob/main/python/setup.py#L51-L58 and comment out branches in https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/utils.cuh, unfortunately still got "symbol not found" errors upon trying to load the C++ extension. Any help would be welcome. Thanks again.
You can use cmake -G Ninja to speedup. And you can decide which kernels you need to compile by enabling or disabling the parameters FLASHINFER_* in https://github.com/flashinfer-ai/flashinfer/blob/main/CMakeLists.txt. Perhaps you don't need all types of kernels when using them.
Hi @skrider , you can remove some auto-generated files by:
rm -rf csrc/generated/
You can use environment variables to specify which shapes/configurations to compile:
rm -rf build/ # ninja fails to track file changes in `*.cuh` files
FLASHINFER_HEAD_DIMS=64,128 pip install -e .
Hi @zhyncs , thanks for your suggestions, the cmake files are used to compile benchmarks and tests and they are not related to python wheel building.
@yzh119 I can't use flashinfer(on commit https://github.com/flashinfer-ai/flashinfer/commit/b3fef8a0f7f6cf663bb7e4aaeda3d6e82c0b859a) when following below commands:
rm -rf csrc/generated/
rm -rf build/
FLASHINFER_HEAD_DIMS=128 pip install -e .
After running above commands, i can succeed to install flashinfer. But when run example script, it will raise undefined symbol error:
Traceback (most recent call last):
File "/home/roy/flashinfer/python/test.py", line 2, in <module>
import flashinfer
File "/home/roy/flashinfer/python/flashinfer/__init__.py", line 17, in <module>
from .decode import (
File "/home/roy/flashinfer/python/flashinfer/decode.py", line 31, in <module>
raise e
File "/home/roy/flashinfer/python/flashinfer/decode.py", line 22, in <module>
from . import _kernels
ImportError: /home/roy/flashinfer/python/flashinfer/_kernels.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN10flashinfer33BatchDecodeWithPagedKVCacheKernelILb1ELNS_15PosEncodingModeE1ELj2ELj4ELj8ELj8ELj1ELj16ELNS_11PageStorageE0ELNS_9QKVLayoutE1E6__halfS4_iEEvPT9_PT11_NS_10paged_kv_tIXT7_EXT8_ES5_S7_EENS_19kv_partition_info_tIS7_EEPT10_SE_Pffff
If not specific FLASHINFER_HEAD_DIMS, the error is gone and everything is ok.
Hi @esmeetu @yzh119 , how long does it take on average to build from source? My installation seems to get stuck:
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///app/LLMs/repos/flashinfer/python
Preparing metadata (setup.py) ... done
Installing collected packages: flashinfer
Running setup.py develop for flashinfer
Yes. And I face the same undefined symbol error as @esmeetu
@ZSL98 It's up to your cpu performance and MAX_JOBS config. You can see the progress by adding verbose args, like pip install -e . -v.
Currently, FlashInfer has already used ninja. On my development machine (96-core CPU), the compilation is expected to take about 20 minutes, which is acceptable to me for now. If there are other ways to speed up the compilation, feel free to submit a PR.
https://github.com/flashinfer-ai/flashinfer/blob/a23979b330b1868a6787f826d8da877bf6603b8a/python/setup.py#L297-L304