flashinfer icon indicating copy to clipboard operation
flashinfer copied to clipboard

Faster compilation times

Open skrider opened this issue 1 year ago • 5 comments

Thank you for the awesome project! I am interested in doing some experimentation using this kernel as a base, however the compilation times are quite long. What temporary changes can be made to the codebase to speed it up, selecting only a particular set of kernel template arguments?

I attempted to modify https://github.com/flashinfer-ai/flashinfer/blob/main/python/setup.py#L51-L58 and comment out branches in https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/utils.cuh, unfortunately still got "symbol not found" errors upon trying to load the C++ extension. Any help would be welcome. Thanks again.

skrider avatar Mar 05 '24 01:03 skrider

Thank you for the awesome project! I am interested in doing some experimentation using this kernel as a base, however the compilation times are quite long. What temporary changes can be made to the codebase to speed it up, selecting only a particular set of kernel template arguments?

I attempted to modify https://github.com/flashinfer-ai/flashinfer/blob/main/python/setup.py#L51-L58 and comment out branches in https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/utils.cuh, unfortunately still got "symbol not found" errors upon trying to load the C++ extension. Any help would be welcome. Thanks again.

You can use cmake -G Ninja to speedup. And you can decide which kernels you need to compile by enabling or disabling the parameters FLASHINFER_* in https://github.com/flashinfer-ai/flashinfer/blob/main/CMakeLists.txt. Perhaps you don't need all types of kernels when using them.

zhyncs avatar Mar 05 '24 05:03 zhyncs

Hi @skrider , you can remove some auto-generated files by:

rm -rf csrc/generated/

You can use environment variables to specify which shapes/configurations to compile:

rm -rf build/ # ninja fails to track file changes in `*.cuh` files
FLASHINFER_HEAD_DIMS=64,128 pip install -e .

Hi @zhyncs , thanks for your suggestions, the cmake files are used to compile benchmarks and tests and they are not related to python wheel building.

yzh119 avatar Mar 05 '24 14:03 yzh119

@yzh119 I can't use flashinfer(on commit https://github.com/flashinfer-ai/flashinfer/commit/b3fef8a0f7f6cf663bb7e4aaeda3d6e82c0b859a) when following below commands:

rm -rf csrc/generated/
rm -rf build/
FLASHINFER_HEAD_DIMS=128 pip install -e .

After running above commands, i can succeed to install flashinfer. But when run example script, it will raise undefined symbol error:

Traceback (most recent call last):
  File "/home/roy/flashinfer/python/test.py", line 2, in <module>
    import flashinfer
  File "/home/roy/flashinfer/python/flashinfer/__init__.py", line 17, in <module>
    from .decode import (
  File "/home/roy/flashinfer/python/flashinfer/decode.py", line 31, in <module>
    raise e
  File "/home/roy/flashinfer/python/flashinfer/decode.py", line 22, in <module>
    from . import _kernels
ImportError: /home/roy/flashinfer/python/flashinfer/_kernels.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN10flashinfer33BatchDecodeWithPagedKVCacheKernelILb1ELNS_15PosEncodingModeE1ELj2ELj4ELj8ELj8ELj1ELj16ELNS_11PageStorageE0ELNS_9QKVLayoutE1E6__halfS4_iEEvPT9_PT11_NS_10paged_kv_tIXT7_EXT8_ES5_S7_EENS_19kv_partition_info_tIS7_EEPT10_SE_Pffff

If not specific FLASHINFER_HEAD_DIMS, the error is gone and everything is ok.

esmeetu avatar Apr 05 '24 02:04 esmeetu

Hi @esmeetu @yzh119 , how long does it take on average to build from source? My installation seems to get stuck:

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///app/LLMs/repos/flashinfer/python
  Preparing metadata (setup.py) ... done
Installing collected packages: flashinfer
  Running setup.py develop for flashinfer

Yes. And I face the same undefined symbol error as @esmeetu

ZSL98 avatar Apr 06 '24 09:04 ZSL98

@ZSL98 It's up to your cpu performance and MAX_JOBS config. You can see the progress by adding verbose args, like pip install -e . -v.

esmeetu avatar Apr 07 '24 10:04 esmeetu

Currently, FlashInfer has already used ninja. On my development machine (96-core CPU), the compilation is expected to take about 20 minutes, which is acceptable to me for now. If there are other ways to speed up the compilation, feel free to submit a PR.

https://github.com/flashinfer-ai/flashinfer/blob/a23979b330b1868a6787f826d8da877bf6603b8a/python/setup.py#L297-L304

zhyncs avatar Aug 27 '24 06:08 zhyncs