ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Cannot build extensions when no gpu device exists

Open ccoulombe opened this issue 1 year ago • 0 comments

🐛 Describe the bug

When no GPU device exists, such as CI or build nodes, no extensions can be built since torch.cuda.is_available checks for a device and not if cuda is actually available.

Cuda libraries are available on such nodes.

This is caused by https://github.com/hpcaitech/ColossalAI/blob/36c4bb2893e73022b1060bd6ad5c0685869e5465/extensions/cuda_extension.py#L30

Solution : use an environment variable FORCE_CUDA which is a common case, to enable such build.

Environment

Linux with Cuda available, but no GPU device.

ccoulombe avatar Mar 28 '24 14:03 ccoulombe