rules_cuda icon indicating copy to clipboard operation
rules_cuda copied to clipboard

Invalid relocatable device link action generated on header-only cuda_library targets with LTO arches

Open cjm-dd opened this issue 9 months ago • 4 comments
trafficstars

Trying to build a cuda_library target that only has hdrs with --@rules_cuda//cuda:archs="lto_80" (or any LTO arch target) correctly results in no compilation actions but does generate a device link action with zero inputs, which obviously fails.

From reading the action implementations compile.bzl correctly handles the empty srcs case since this loop never fires on empty inputs, but in dlink.bzl a depset of objects & transitive objects gets passed straight to actions.run so there's no check whether the set of direct inputs is empty or not.

cjm-dd avatar Jan 28 '25 18:01 cjm-dd

At the time of rdc implementaion, device link is extremely annoying when involving "deep" rdc. When a deps pass in incomplete device objects, the problem is easy, just device link and form a new depset of objects. The problem becomes annoying when the objects are coming from transitive deps, nvlink refuse to consume the complete objects produced by intermediate device link... So we want a deterministic point where the device link happens. So the attr rdc serves the purpose and signifies that the very target wants a device link action in between. The rdc is a bad name in this case, unfortunately. It should have been dlink or something else...

The solution is either

  • use cc_library for header only targets with no library deps
  • disable rdc on the target, only use it if you want the target to produce objects with relocatable device code for downstream dlink and perform a device link for downstream host link.

cloudhan avatar Jan 29 '25 15:01 cloudhan

See #164

cloudhan avatar Jan 29 '25 16:01 cloudhan

I haven't set rdc = True on the header-only target, only on the topmost cuda_library target that consumes it. AFAICT there is the compounding issue that choosing an LTO arch target seems to force rdc to True even if the cuda_library target hasn't set rdc = True.

cjm-dd avatar Jan 29 '25 18:01 cjm-dd

This seems unsolveable under current setting.

  • A true solution: split relocatable device code production (with rdc attr) and device linking action (with new dlink attr)
  • A workaround: disable lto_* rdc auto enablement

both will be breaking...

cloudhan avatar Feb 06 '25 10:02 cloudhan