rules_cuda
rules_cuda copied to clipboard
Invalid relocatable device link action generated on header-only cuda_library targets with LTO arches
Trying to build a cuda_library target that only has hdrs with --@rules_cuda//cuda:archs="lto_80" (or any LTO arch target) correctly results in no compilation actions but does generate a device link action with zero inputs, which obviously fails.
From reading the action implementations compile.bzl correctly handles the empty srcs case since this loop never fires on empty inputs, but in dlink.bzl a depset of objects & transitive objects gets passed straight to actions.run so there's no check whether the set of direct inputs is empty or not.
At the time of rdc implementaion, device link is extremely annoying when involving "deep" rdc. When a deps pass in incomplete device objects, the problem is easy, just device link and form a new depset of objects. The problem becomes annoying when the objects are coming from transitive deps, nvlink refuse to consume the complete objects produced by intermediate device link... So we want a deterministic point where the device link happens. So the attr rdc serves the purpose and signifies that the very target wants a device link action in between. The rdc is a bad name in this case, unfortunately. It should have been dlink or something else...
The solution is either
- use cc_library for header only targets with no library deps
- disable
rdcon the target, only use it if you want the target to produce objects with relocatable device code for downstream dlink and perform a device link for downstream host link.
See #164
I haven't set rdc = True on the header-only target, only on the topmost cuda_library target that consumes it. AFAICT there is the compounding issue that choosing an LTO arch target seems to force rdc to True even if the cuda_library target hasn't set rdc = True.
This seems unsolveable under current setting.
- A true solution: split relocatable device code production (with
rdcattr) and device linking action (with newdlinkattr) - A workaround: disable
lto_*rdcauto enablement
both will be breaking...