cloudhan

Results 191 comments of cloudhan
trafficstars

Seems to be designed to be accessible from `cc_toolchain` directly https://github.com/bazelbuild/rules_cc/blob/6be85c266b1df3a5bd38290c814d83ee643185dd/cc/private/rules_impl/cc_flags_supplier_lib.bzl#L57-L62 Then we can attach it to our cuda_info then add a corresponding feature for it.

Sorry for the delay. Was busy working on internal projects. After examining the cc rules implementation https://github.com/bazelbuild/bazel/blob/a3abc625c78439a5ebf1bb09491627c802f2453d/src/main/starlark/builtins_bzl/common/cc/cc_toolchain_provider_helper.bzl#L198-L204 I think `cc_toolchain.sysroot` is the de facto sysroot what rules_cuda actions can rely...

I think a `--@rules_cuda` flag might be the prefered way to move forward. The current implementation is a brutal search, it affects every action invocation. It matches a `"--sysroot="` with...

@lukasoyen another (much) better option would be adding a special rule `cuda_cc_sysroot` to figure out the runtime sysroot and then add the target to the generated @cuda repo and then...

@liuqi123123 If no nan or inf in Q and K, Q@K (contraction along head_dim) always produce valid P matrix in seqlen boundary, for OOB values they non NaN or Inf....

At the time of rdc implementaion, device link is extremely annoying when involving "deep" rdc. When a deps pass in incomplete device objects, the problem is easy, just device link...

This seems unsolveable under current setting. - A true solution: split relocatable device code production (with `rdc` attr) and device linking action (with new `dlink` attr) - A workaround: disable...

I think you are configuring it correctly. The root problem is the cross compiling is not addressed in this rule at the moment. So `exec_compatible_with` for tools and `target_compatible_with` for...