[FEATURE]: Build kernel only when executed
Describe the feature
In the current Colossal-AI implementation, we build Colossal-AI in two ways:
- built when doing
CUDA_EXT=1 pip install colossalai - build the CUDA kernel when importing Colossal-AI
However, building all kernels at least takes 5 min, which is too long for many users. Moreover, not all kernels are in fact required by a user. Therefore, only build the kernel needed for the current program would be the best to balance user experience and the completeness of the library.
update
it seems that only CUDA_EXT=1 pip install colossalai will work. If I install without CUDA_EXT=1, there will be errors. Previously, I try to import when colossalai is in PWD, and it start to build the extension. If I import from other place, it just raise errors.
Please at least add the instructions to the readme file. I got the error below
ImportError: cannot import name 'fused_optim' from 'colossalai._C'
I did not find any description of this. Only when I happen to type import colossalai and it starts to build extensions, I think I have wasted too much time.
Hi @flymin , I have set up a PR #2374 to enable runtime build to reduce the frustration during installation. Hope it can help.