vllm
vllm copied to clipboard
Use O3 optimization instead of O2 for CUDA compilation?
We are currently using the -O2 flag in compiling our CUDA kernels. We need to investigate whether/how changing it to -O3 affects the system performance and compilation time.